Sanity check on RRA settings

Post general support questions here that do not specifically fall into the Linux or Windows categories.

Moderators: Developers, Moderators

Post Reply
User avatar
dune
Posts: 38
Joined: Tue Oct 24, 2006 4:19 pm
Location: Dallas, TX

Sanity check on RRA settings

Post by dune »

I'm working on new RRA settings based on a 1-minute polling interval and could use a sanity check while I wait for data to collect.

My goal is to retain the 1-minute data for a year so that I can go back to any time frame within the last year and generate the same graphs I can see today (yes, I know the rrd's will be large). Since I also may need to generate graphs of varying time frames within the last year, I am also attempting to retain 5 minute, 30 minute and 2 hour averages for a year. Lastly the 1-day averages should be kept for 5-years.

I've been over the "0.8.7 and 1 minute polling" post but as this is my first attempt the comfort factor isn't entirely there. Here are my current settings:

Code: Select all

Name                         Steps        Rows      Timespan
Hourly (1 Minute Average)    1            525600    14400
Daily (5 Minute Average)     5            105120    86400
Weekly (30 Minute Average)   30           17520     604800
Monthly (2 Hour Average)     120          4380      2678400
Yearly (1 Day Average)       1440         1825      31536000
Any confirmation and/or recommendations would be appreciated. Also, is there any way to apply these settings to existing graphs moving forward, or must I recreate all my graphs?
[b]dune[/b]
Cacti v0.8.7b/Spine v0.8.7c-beta2 (PA 2.1, Settings 0.5, THold 0.3.9) on Server 2003 SP2/IIS 6.0/PHP5/MySQL 5.0.45
eschoeller
Cacti User
Posts: 234
Joined: Mon Dec 13, 2004 3:03 pm

Post by eschoeller »

I am very interested to hear from the developers or other users if your RRA definitions are correct. You are basically looking for the same retention I am looking for. I am going to apply this to a new installation however (for home use). My installation at the office currently uses the default RRAs, and I wanted to change that, but it seems difficult to do.

See this post:
http://forums.cacti.net/about27585.html
User avatar
dune
Posts: 38
Joined: Tue Oct 24, 2006 4:19 pm
Location: Dallas, TX

Post by dune »

Thus far it appears to be working but I believe the best test will be to go back at least 1-2 weeks. I re-created some graphs yesterday afternoon and went back to 15:00 to 16:00 from yesterday and was able to still see the 1-minute samples.

I went and checked the RRD sizes and it looks like the file size is pre-allocated which is nice. Using these settings, a single data source with one item uses 20MB of space. My RRA folder is now 2GB.

Here is my logic on the RRA settings to achieve this, and my goal here is to have these confirmed to avoid the need to loose data and start over at a later date.

Hourly (1 Minute Average)
Step: 1, baseline from the 1-minute polling interval
Rows: 525600, number of rows (or entries) to retain 1-minute samples for a year. There are (60 per hour * 24 hours per day * 365 days) minutes in a year.
Timespan: Not sure on this one, this is the Cacti default which is 4-hours. I believe this should be 3600 though.

Daily (5 Minute Average)
Step: 5, takes 5 1-minute samples to get a 5-minute average
Rows: 105120, there are (12 per hour * 24 hours per day * 365 days) 5-minute periods in a year.
Timespan: 86400, "daily graph" thus there this many seconds in a day.

Weekly (30 Minute Average)
Step: 30, takes 30 1-minute samples to get a 30-minute average
Rows: 17520, there are (48 per day * 365 days) 30-minute periods in a year.
Timespan: 604800, "weekly graph" thus there this many seconds in a week.

Monthly (2 Hour Average)
Step: 120, takes 120 1-minute samples to get a 2-hour average
Rows: 4380, there are (12 per day * 365 days) 2-hour periods in a year.
Timespan: 2678400, "monthly graph" thus there this many seconds in a month.

Yearly (1 Day Average)
Step: 1440, takes 1440 1-minute samples to get a 1-day average
Rows: 1825, this is to retain for 5-years. 365 days * 5 years = 1825
Timespan: 31536000, "yearly graph" thus there this many seconds in a year.
[b]dune[/b]
Cacti v0.8.7b/Spine v0.8.7c-beta2 (PA 2.1, Settings 0.5, THold 0.3.9) on Server 2003 SP2/IIS 6.0/PHP5/MySQL 5.0.45
eschoeller
Cacti User
Posts: 234
Joined: Mon Dec 13, 2004 3:03 pm

Post by eschoeller »

That logic looks sound. From what I understand the timespan should not matter much, that just defines the length of time for the default views.

Does it take noticeably longer to generate graphs with such large RRD files?
I worry about my aggregate graphs which pool 10-20 Data Sources together. That may put a lot of strain on the hardware.

What kind of machine are you running cacti on? How many data sources do you have?
User avatar
dune
Posts: 38
Joined: Tue Oct 24, 2006 4:19 pm
Location: Dallas, TX

Post by dune »

I have not noticed any impact in graph generation time, although time will tell once the RRD's begin to fill with meaningful data instead of just zeros.

The server is pretty decent, HP dual-core Xeon @ 2.0ghz w/4GB memory. I have about 2200 data sources but only about 50 have been reconfigured with the new RRA settings. I'm hoping to find a way where I don't have to re-create everything but that may turn out to be the best route.
[b]dune[/b]
Cacti v0.8.7b/Spine v0.8.7c-beta2 (PA 2.1, Settings 0.5, THold 0.3.9) on Server 2003 SP2/IIS 6.0/PHP5/MySQL 5.0.45
User avatar
dune
Posts: 38
Joined: Tue Oct 24, 2006 4:19 pm
Location: Dallas, TX

Post by dune »

I believe I've hit my first snag with these aggressive RRA settings. Since my initial post, I have slowly been re-adding my datasources and have gone from 50 to about 500 of roughly 2300 total. In the process, my poller runtime has gone from 12-15 seconds to 50-70 seconds.

With a 1-minute poller time this is obviously starting to cause gaps in my graphs. Although the RRD files are larger, I don't see how this has had such an impact on poller performance.

I'm going to try switching to spine but since I have yet to identify a specific bottleneck I'm not sure what the results will be.

Should large RRD's affect the poller and if so, why?
[b]dune[/b]
Cacti v0.8.7b/Spine v0.8.7c-beta2 (PA 2.1, Settings 0.5, THold 0.3.9) on Server 2003 SP2/IIS 6.0/PHP5/MySQL 5.0.45
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Post by gandalf »

The poller consists of two parts (in a first approach)
- the data gathering via cmd.php or spine
- the rrdtool update itself

See the Announcement Forums for some figures of spine performance improvements and other hints. Expect a performance increase of a magnitude when using latest spine instead of cmd.php.

In fact, rrdtool update is not related to the rrd file size (at least approximately). But there are other issues to be taken into account
- file system performance
- SAN/NAS/RAID
- OS: Linux, Windows, *BSD
- support for fadvice on newer linux kernels
- tweaking disk cache queue length may help (thanks to 3.CCC.eu participants for this one)
- memory for file caching
- using at least rrdtool 1.2.23 or better 1.3.4

Surely, I left things out due to lack of free random access storage in my brain :wink:
Reinhard
User avatar
dune
Posts: 38
Joined: Tue Oct 24, 2006 4:19 pm
Location: Dallas, TX

Post by dune »

After further research I believe I am battling disk performance issues. It has been the case for a while now but is just more noticeable with the larger RRD files that Cacti has to work with.

I'll post my results after this bottleneck is resolved.
[b]dune[/b]
Cacti v0.8.7b/Spine v0.8.7c-beta2 (PA 2.1, Settings 0.5, THold 0.3.9) on Server 2003 SP2/IIS 6.0/PHP5/MySQL 5.0.45
moon1234
Posts: 3
Joined: Thu Oct 16, 2008 1:49 pm

Post by moon1234 »

If you set these values for steps and rows on a per RRA value, then isn't this dependant on how often your data sources are polled? i.e. If I poll a device every 300 seconds (5 minutes) instead of every 60 seconds, won't I then have 25 minutes worth of time being represented for the 5 minute average? I come to this conclusion because you selected a step of 5 in your 5 minute poll. This would mean to consolidate 5 steps (300 second intervals) into a single point.

This could turn into a nightmare unless all of your datasources are polled with the same frequency. Some datasources I want to poll every 300 seconds instead of every minute. Some I only want to poll once an hour (Like disk space usage.).

How does RRD differentiate between the device poll time and the rra step to generate the proper graph?

There has got to be a simple document somewhere that explains all this.
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Post by gandalf »

Yes.
That's why we withdrew the full multiple polling thingy from 087. The way we want to cope with it deviates a bit from the rrdtool way of thinking.

1. We want to introduce sth like RRA Templates, thinking in units of "timespans", not in "number of pdp's"
2. When creating a graph, we will want to let you select the correct polling interval.
3. From the polling interval and the RRA template we want to compute the correct amount of pdp's/cdp's required for the given timespan/interval

This way we hope to make it clearer to the user.
But be aware: There's no chance to change the polling interval later on. That's due to the fact that rrdtool does not allow to "tune" the interval.

Reinhard
moon1234
Posts: 3
Joined: Thu Oct 16, 2008 1:49 pm

Post by moon1234 »

Thanks. After several bleary eyed nights I figured I would just need multiple RRA schedules to marry up with my data source poll intervals. I would up including the data source poll interval as part of the name of the RRA so I could keep track which RRA to use with each data source based on the poll interval.

Once I got that concept straight in my head then it is was easy, but tedious, to set things up.

I am a little worried about upgradeability to anything later than .87b if this whole process changes. I like the idea you had above, but it would be a new concept again that may require dumping existing RRD's.
chrissie
Posts: 1
Joined: Mon Oct 20, 2008 8:35 am
Location: Cardiff

Post by chrissie »

Hi all, I am sure that I am missing something REALLY obvious here but searching and searching has pickled my brain!

Here is what I am trying to do:
I need to create graphs to collate data displayed as follows:

Daily (5 min interval) - using default config
Weekly (5 min interval)
Monthly (5 min interval)

I have set this up but the graphs are showing the same data for Weekly - 30min and Weekly 5 min and also Monthly 2 hour and Mnthly 5 min.... I am sure I am missing something really obvious here but cannot seem to work it out. I am not particularly technical but then this isnt really a technical job as such so should be relatively easy but do you think I can figure it out??

I dont know if I am looking into it in the wrong way but when I download the data in CSV forat from Weekly 30 min and Weekly 5 min, the data recorded is identical, the data is recorded in intervals of 30 mins - surely I should see it recorded at 5 min intervals for my new entries? Or is this irrelevant?

Any help you can give me would be gratefully received as I am on a deadline to get this done and seem to be losing the plot!

Cheers

a VERY bewildered newbie
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Post by gandalf »

chrissie wrote:Hi all, I am sure that I am missing something REALLY obvious here but searching and searching has pickled my brain!

Here is what I am trying to do:
I need to create graphs to collate data displayed as follows:

Daily (5 min interval) - using default config
Weekly (5 min interval)
Monthly (5 min interval)
Don't feel ashamed, but yes, this is a common mistake. Unfortunately, the cacti way of defining rra's makes this mistake even more common.
To make it simple:
There is NO daily 5 min rra and weekly 5 min rra and monthly 5 min rra.
It's simply only ONE 5 min rra out there. You may want to increas the number of data points at this consolidation level (well, consolidation is almost a NOP for 5 min).

Please see http://docs.cacti.net/?q=node/75 for more guidance
Reinhard
Post Reply

Who is online

Users browsing this forum: No registered users and 19 guests