Keep info for longer without loosing existing data?

Post general support questions here that do not specifically fall into the Linux or Windows categories.

Moderators: Developers, Moderators

squeak
Posts: 27
Joined: Fri Nov 26, 2010 5:59 pm

Keep info for longer without loosing existing data?

Post by squeak »

Hi Everyone,

Been using Cacti for around a year now, with quite a few customised pollers for a lot of our equipment here (gas pressures, temperatures, voltages, flow rates etc.). Quite a lot of our pollers are on 30 second polls due to the type of data we need, and we've always used the standard RRAs for displaying the various time periods.

However, something recently came up where we needed to see what a sensor had been recording several months ago, and the data isn't there. Due to the increased polling frequency of some of our pollers we only have 3 months of data for many of them (when viewing the 'Yearly' graphs.)

So my question is two fold - firstly, how do we go about getting cacti to store data for longer (say 2 years) and secondly, how do we 'update' the existing installation to store data for longer periods of time without loosing the data we currently have?

Any advice is much appreciated!

Cheers,
Last edited by squeak on Mon Nov 29, 2010 10:48 am, edited 1 time in total.
User avatar
BSOD2600
Cacti Moderator
Posts: 12171
Joined: Sat May 08, 2004 12:44 pm
Location: USA

Re: Keep info for longer without loosing existing data?

Post by BSOD2600 »

this is due to rrdtool consolidation, http://docs.cacti.net/manual:087 scroll to the bottom for more info.
squeak
Posts: 27
Joined: Fri Nov 26, 2010 5:59 pm

Re: Keep info for longer without loosing existing data?

Post by squeak »

Hi there,

Thanks very much for the info - unfortunately my experience with Cacti so far has been entirely on the custom pollers side and i know little (or nothing) about the rrd/rra aspects of it. I've had a read through that section and i can see that i need to create an RRA with a larger row and timespan, which i have now done. I attached it to the existing data template and i'm now getting a new graph being generated which is showing me 2 years. Which is good!...

However...the data in the new graph is still dissapearing as time goes by?

i.e. currently the data goes back to 08/08/2010 @ 9pm but 2 hours ago the graph was showing data back to 6pm on the same day, so it is still clearing up data at the same rate.

The new RRA is as follows:

Steps: 300
Rows: 1152000
Timespan: 66100000

And a datasource debug shows it, for example:

Code: Select all

Data Source Debug

/usr/bin/rrdtool create \
/var/lib/cacti/rra/cdc-chiller01_snmp_oid_436.rrd \
--step 30  \
DS:snmp_oid:GAUGE:600:-1000:1000 \
RRA:AVERAGE:0.5:1:500 \
RRA:AVERAGE:0.5:1:600 \
RRA:AVERAGE:0.5:6:700 \
RRA:AVERAGE:0.5:24:775 \
RRA:AVERAGE:0.5:288:115200 \
RRA:AVERAGE:0.5:300:1152000 \
RRA:MAX:0.5:1:500 \
RRA:MAX:0.5:1:600 \
RRA:MAX:0.5:6:700 \
RRA:MAX:0.5:24:775 \
RRA:MAX:0.5:288:115200 \
RRA:MAX:0.5:300:1152000 \
The graph it produces is correctly showing me approx 2 years (give or take, i've just estimated the values for now), but as i mentioned above it is still clearing up data at the same rate.

Can you point me at what i'm not understanding please? clearly i'm doing something wrong but i'm not quite sure what?

cheers!
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: Keep info for longer without loosing existing data?

Post by gandalf »

In the very same documentation section, you will find a script to automatically resize existing rrd files to your needs
R.
squeak
Posts: 27
Joined: Fri Nov 26, 2010 5:59 pm

Re: Keep info for longer without loosing existing data?

Post by squeak »

Hi There,

Thanks very much for that, and sorry for missing it in the first place! :roll:

Right then - As i test i added some rows to the first [0] section of an rrd, as per the instructions, and it now shows as follows:

Code: Select all

ds[traffic_in].type = "COUNTER"
ds[traffic_out].type = "COUNTER"
rra[0].cf = "AVERAGE"
rra[0].rows = 8600 <- Increased from 600
rra[1].cf = "AVERAGE"
rra[1].rows = 700
rra[2].cf = "AVERAGE"
rra[2].rows = 775
rra[3].cf = "AVERAGE"
rra[3].rows = 797
rra[4].cf = "MAX"
rra[4].rows = 8600 <- Increased from 600
rra[5].cf = "MAX"
rra[5].rows = 700
rra[6].cf = "MAX"
rra[6].rows = 775
rra[7].cf = "MAX"
rra[7].rows = 797
Which is exactly what i was expecting - i think this will now keep 8600 datapoints under the first RRA? (not my intention in the long run, just a test to see if this works)

Next, I updated the RRD target for the graph which the original RRD was attached to, and generated the graph.

Unfortunately, the graph no longer shows 'previous' data, but it is recording new data stating from now.

So, i guess i'm still missing something? :(
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: Keep info for longer without loosing existing data?

Post by gandalf »

squeak wrote:Which is exactly what i was expecting - i think this will now keep 8600 datapoints under the first RRA? (not my intention in the long run, just a test to see if this works)
Yes
Next, I updated the RRD target for the graph which the original RRD was attached to, and generated the graph.

Unfortunately, the graph no longer shows 'previous' data, but it is recording new data stating from now.
What exactly did you do when "updated the RRD target for the graph"?
R.
squeak
Posts: 27
Joined: Fri Nov 26, 2010 5:59 pm

Re: Keep info for longer without loosing existing data?

Post by squeak »

Hi,

The particular RRD records a single SNMP value (water temperature in this case) so for ease i just, in order:

1) Duplicated an existing datasource (console->datasource->duplicate)
2) Edited the 'new' datasource and replaced the RRD it generated with my newly 'modified' one (edited the Data Source Path and pointed it to the modified RRD)
3) Edited an existing graph to reference the new data source (graph management -> Data Source 1 -> Selected datasource from step 1)

Since it didn't work i'm obviously doing something wrong :)

Thanks for your help!!
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: Keep info for longer without loosing existing data?

Post by gandalf »

squeak wrote:2) Edited the 'new' datasource and replaced the RRD it generated with my newly 'modified' one (edited the Data Source Path and pointed it to the modified RRD)
Why? As you said, the rrd was modified. So the "old" data source still point to the now modified rrd. Why did you duplicate the data source, then? Do you assume, that both (quite identical) data sources are polled now?

In fact, the script was written with the intention to do an in-place-resize.
R.
squeak
Posts: 27
Joined: Fri Nov 26, 2010 5:59 pm

Re: Keep info for longer without loosing existing data?

Post by squeak »

Hi,

The RRD i modified was done as a copy of the live RRD (incase i screwed it up and lost data) so i needed a copy of the datasource to point to the new RRD as a test. The idea was that i touched nothing of the original setup and if i destroyed the RRD or the datasource then nothing was lost.

So, are you saying that by doing this with a copy of an existing datasource (and a copy of an RRD) i am causing the historic data to go missing?

Cheers!
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: Keep info for longer without loosing existing data?

Post by gandalf »

No. In case you copied the rrd file and the data source/graph info, your nearly there. What is still missing is making the poller aware of the second rrd file. Your method does not touch the poller commands, that's why the second rrd won't get updated
R.
squeak
Posts: 27
Joined: Fri Nov 26, 2010 5:59 pm

Re: Keep info for longer without loosing existing data?

Post by squeak »

Hi, ok thats odd because it is indeed being updated, it is just that the historic data goes missing and the graphs produced from the RRD just start from the moment i make the change to them.

Can you confirm then that by running that resize script on an existing RRD i should not be loosing any of the data previously contained within it?

Cheers!
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: Keep info for longer without loosing existing data?

Post by gandalf »

squeak wrote:Hi, ok thats odd because it is indeed being updated, it is just that the historic data goes missing and the graphs produced from the RRD just start from the moment i make the change to them.

Can you confirm then that by running that resize script on an existing RRD i should not be loosing any of the data previously contained within it?

Cheers!
I can confirm, as I already have run that on several thousands of rrd files. But you will of course have to make sure that space is available. Nevertheless, try it on a few rrd files first to get used to it.
R.
squeak
Posts: 27
Joined: Fri Nov 26, 2010 5:59 pm

Re: Keep info for longer without loosing existing data?

Post by squeak »

Hi,

Thanks for that, i've just given it a go with a couple of live files and will have to wait until tomorrow to see if the oldest data goes missing when it does the daily RRA. On the plus side, the existing data going back 6 months has remained in place and it is still graphing and updating correctly, which is cool! :)

I have one final question which you might be able to assist with. I'm trying to understand the relationship between selecting "Associated RRAs" within a Data Template, and the effect (if any) on an existing RRD file if you change which RRAs are associated with a data template AFTER it has been created.

So for example, if i create a data template, and select all 5 of the existing RRA's i currently have. Then in the future i create a new RRA, and i then select it as an additional "Associated RRA" within an existing data template. What happens? I can see that it now generates a new graph based on the timespan specified within the new RRA, but does it actually have any effect on the RRD file itself? Most importantly, will it cause any problems with the rolling-up of the data?

I've read the relevant manual sections several times over now and i'm just not quite 'getting' how this relationship works - sorry!

Thanks again,
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: Keep info for longer without loosing existing data?

Post by gandalf »

squeak wrote:I have one final question which you might be able to assist with. I'm trying to understand the relationship between selecting "Associated RRAs" within a Data Template, and the effect (if any) on an existing RRD file if you change which RRAs are associated with a data template AFTER it has been created.
nothing will happen with existing files. This is partly due to missing support from rrdtool utilities and partly "not yet implemented".
So for example, if i create a data template, and select all 5 of the existing RRA's i currently have. Then in the future i create a new RRA, and i then select it as an additional "Associated RRA" within an existing data template. What happens? I can see that it now generates a new graph based on the timespan specified within the new RRA, but does it actually have any effect on the RRD file itself?
It will have no effect at all (with the exception for graphing an 5th timespan) as you already found out
Most importantly, will it cause any problems with the rolling-up of the data?
It will not cause "problems". The rrd file simply stays at it is today.
R
squeak
Posts: 27
Joined: Fri Nov 26, 2010 5:59 pm

Re: Keep info for longer without loosing existing data?

Post by squeak »

Hi,

Ok, i've hit a bit of a problem with growing some of the files. I get a very odd output after growing them, to try and assist you in assisting me here is everything i have done:

rrdtool info output of original file, working fine and before any changes:

Code: Select all

rrd_version = "0003"
step = 30
last_update = 1292001483
ds[snmp_oid].type = "GAUGE"
ds[snmp_oid].minimal_heartbeat = 600
ds[snmp_oid].min = -1.0000000000e+03
ds[snmp_oid].max = 1.0000000000e+03
ds[snmp_oid].last_ds = "60"
ds[snmp_oid].value = 1.8000000000e+02
ds[snmp_oid].unknown_sec = 0
rra[0].cf = "AVERAGE"
rra[0].rows = 600
rra[0].cur_row = 382
rra[0].pdp_per_row = 1
rra[0].xff = 5.0000000000e-01
rra[0].cdp_prep[0].value = NaN
rra[0].cdp_prep[0].unknown_datapoints = 0
rra[1].cf = "AVERAGE"
rra[1].rows = 700
rra[1].cur_row = 272
rra[1].pdp_per_row = 6
rra[1].xff = 5.0000000000e-01
rra[1].cdp_prep[0].value = 0.0000000000e+00
rra[1].cdp_prep[0].unknown_datapoints = 0
rra[2].cf = "AVERAGE"
rra[2].rows = 775
rra[2].cur_row = 543
rra[2].pdp_per_row = 24
rra[2].xff = 5.0000000000e-01
rra[2].cdp_prep[0].value = 7.0500000000e+02
rra[2].cdp_prep[0].unknown_datapoints = 0
rra[3].cf = "AVERAGE"
rra[3].rows = 797
rra[3].cur_row = 399
rra[3].pdp_per_row = 288
rra[3].xff = 5.0000000000e-01
rra[3].cdp_prep[0].value = 3.5552000000e+03
rra[3].cdp_prep[0].unknown_datapoints = 0
rra[4].cf = "MAX"
rra[4].rows = 600
rra[4].cur_row = 574
rra[4].pdp_per_row = 1
rra[4].xff = 5.0000000000e-01
rra[4].cdp_prep[0].value = NaN
rra[4].cdp_prep[0].unknown_datapoints = 0
rra[5].cf = "MAX"
rra[5].rows = 700
rra[5].cur_row = 33
rra[5].pdp_per_row = 6
rra[5].xff = 5.0000000000e-01
rra[5].cdp_prep[0].value = 6.0000000000e+01
rra[5].cdp_prep[0].unknown_datapoints = 0
rra[6].cf = "MAX"
rra[6].rows = 775
rra[6].cur_row = 481
rra[6].pdp_per_row = 24
rra[6].xff = 5.0000000000e-01
rra[6].cdp_prep[0].value = 6.0000000000e+01
rra[6].cdp_prep[0].unknown_datapoints = 0
rra[7].cf = "MAX"
rra[7].rows = 797
rra[7].cur_row = 157
rra[7].pdp_per_row = 288
rra[7].xff = 5.0000000000e-01
rra[7].cdp_prep[0].value = 6.5000000000e+01
rra[7].cdp_prep[0].unknown_datapoints = 0
After running a 'grow' command as follows :
# perl /var/lib/resize.pl -f <sourcefile> -r 3 -o <outputfile> -g 1000

Code: Select all

rrd_version = "0003"
step = 30
last_update = 1292001483
ds[snmp_oid].type = "GAUGE"
ds[snmp_oid].minimal_heartbeat = 600
ds[snmp_oid].min = -1.0000000000e+03
ds[snmp_oid].max = 1.0000000000e+03
ds[snmp_oid].last_ds = "60"
ds[snmp_oid].value = 1.8000000000e+02
ds[snmp_oid].unknown_sec = 0
rra[0].cf = "AVERAGE"
rra[0].rows = 600
rra[0].cur_row = 382
rra[0].pdp_per_row = 1
rra[0].xff = 5.0000000000e-01
rra[0].cdp_prep[0].value = NaN
rra[0].cdp_prep[0].unknown_datapoints = 0
rra[1].cf = "AVERAGE"
rra[1].rows = 700
rra[1].cur_row = 272
rra[1].pdp_per_row = 6
rra[1].xff = 5.0000000000e-01
rra[1].cdp_prep[0].value = 0.0000000000e+00
rra[1].cdp_prep[0].unknown_datapoints = 0
rra[2].cf = "AVERAGE"
rra[2].rows = 775
rra[2].cur_row = 543
rra[2].pdp_per_row = 24
rra[2].xff = 5.0000000000e-01
rra[2].cdp_prep[0].value = 7.0500000000e+02
rra[2].cdp_prep[0].unknown_datapoints = 0
rra[3].cf = "AVERAGE"
rra[3].rows = 1797                         <--- has 'grown' correctly
rra[3].cur_row = 399
rra[3].pdp_per_row = 288
rra[3].xff = 5.0000000000e-01
rra[3].cdp_prep[0].value = 3.5552000000e+03
rra[3].cdp_prep[0].unknown_datapoints = 0
rra[4].cf = "MAX"
rra[4].rows = 600
rra[4].cur_row = 574
rra[4].pdp_per_row = 1
rra[4].xff = 5.0000000000e-01
rra[4].cdp_prep[0].value = NaN
rra[4].cdp_prep[0].unknown_datapoints = 0
rra[5].cf = "MAX"
rra[5].rows = 700
rra[5].cur_row = 33
rra[5].pdp_per_row = 6
rra[5].xff = 5.0000000000e-01
rra[5].cdp_prep[0].value = 6.0000000000e+01
rra[5].cdp_prep[0].unknown_datapoints = 0
rra[6].cf = "MAX"
rra[6].rows = 775
rra[6].cur_row = 481
rra[6].pdp_per_row = 24
rra[6].xff = 5.0000000000e-01
rra[6].cdp_prep[0].value = 6.0000000000e+01
rra[6].cdp_prep[0].unknown_datapoints = 0
rra[7].cf = "MAX"
rra[7].rows = 797
rra[7].cur_row = 157
rra[7].pdp_per_row = 288
rra[7].xff = 5.0000000000e-01
rra[7].cdp_prep[0].value = 6.5000000000e+01
rra[7].cdp_prep[0].unknown_datapoints = 0
Graphs from before and after are attached - hopefully you can see the odd behaviour!
I have scaled the before graph to match the after graph so that you can see a direct comparison. The 'after' graph is showing ALL the data.

Oh, and just to clarify, the only data contained within this file is the orange line, the other 4 are sourced from other RRD files which i have not modified, the just happen to be rendered on the same graph.

Finally, just to clarify what i'm actually trying to do (in case i'm going about it completely the wrong way) - all i want to do is take the 1 day average (lowest resolution) and extend the period which it is kept so that the rrd stops dropping data from the 'back' of time. Currently it is clearing up data from mid september as you can see, so i just want that data to not be erased as time goes on.

Hopefully you can tell me what i'm doing wrong! :)

Thanks SO MUCH for your help, it is genuinely appreciated!!
Attachments
02-after.jpg
02-after.jpg (124.35 KiB) Viewed 1406 times
01-before.jpg
01-before.jpg (117.69 KiB) Viewed 1406 times
Post Reply

Who is online

Users browsing this forum: No registered users and 5 guests