[HOWTO] 0.8.7 and 1 minute polling

zeki · Post by **zeki** » Tue Nov 06, 2007 3:13 am

i saw in another topic that the cron should still be 5 min intervals. cacti will automatically split the intervals if you choose 1 min, but you still need to setup the rra

you have to setup the data templates and graph templates for 1 min because they are setup for 5 min. then you will have to recreate the graphs

david(dallas) · Post by **david(dallas)** » Tue Nov 06, 2007 9:47 am

zeki wrote:you have to setup the data templates and graph templates for 1 min because they are setup for 5 min. then you will have to recreate the graphs

Please provide more specifics. From what I know, I beleive I have setup the data templates for 1 minute (but maybe I'm still missing something)and I don't see anything in the graph templates where you would select an interval or step. This is probably what I'm missing, how to do these things, especially the graph templates.

BIG Thanks in advance..

Post by **TheWitness** » Tue Nov 06, 2007 10:20 am

There are a number of posts. However, if you set the heartbeat to 3600, it will only be polled once an hour. If you set it to 60, it will be polled every minute and if you set to 10, it will poll every 10 seconds.

TheWitness

tekbot · Post by **tekbot** » Fri Nov 09, 2007 9:10 am

Hello Everyone. First off, I want to say that 0.8.7 appears to be an excellent release. A number of quirks and bugs have been fixed, and the new scheduling is working *beautifully*. Excellent Job Developers!

I wanted to post my configuration for 0.8.7 sub-1 minute Polling intervals to help some of you guys out since this can be a pretty confusing topic for newcomers who want granularity in their graphs. First off, a bit of explanation.

Number 1 - Cacti 0.8.7 ships with the ability to poll at intervals as granular as 10 seconds without requiring the Plugin Architecture or any patches. The implementation of this differs from all 0.8.6 releases (I cannot speak for the SVN builds for later 0.8.6). The main difference is in the scheduling. Regardless of the polling interval you want, keep your cron entry at 5 minutes!. For 0.8.7 (ONLY) the interval should be selected in the Console -> Settings -> Poller page. Setting the cron to anything other than 5 minutes will cause gaps in your graphs!

Number 2 - While the implementation of native sub-1 minute Pollling Intervals has been packaged in this new release, no RRAs or Data Templates were provided for any of these intervals. I do a lot of customization with cacti and have built a number of new RRAs and Data Templates to work with the new granular poller. Below, I have posted my RRA Configurations, Data Template Configurations, as well as an explanation of How These Work.

It should be noted that the settings below are only provided as an example. These settings work well for me, but I'm running cacti on a very powerful dedicated server that has sufficient resources to poll a large number of devices at very low intervals, and sufficient storage to hold large amounts of granular data over long periods of time. Use the templates below as an example, but make sure you have sufficient resources at your disposal. In other words -- USE AT YOUR OWN RISK

Here are my Custom RRAs (found under Console -> Data Sources -> RRAs)

Code: Select all

Name / Steps / Rows / Timespan
5 min - 5 Minute Average - 24 Hour View   	 1  	 864  	 86400  	 
(10 sec) - 10 Second Average - 24 Hour View 	1 	25920 	86400 	
1 min - 1 Minute Average - 24 Hour View 	1 	4320 	86400 	
5 min - 5 Minute Average - 7 Day View 	1 	8928 	604800 	
(10 Sec) - 1 Minute Average - 7 Day View 	6 	44640 	604800 	
1 min - 1 Minute Average - 7 Day View 	1 	44640 	604800 	
5 min - 5 Minute Average - 1 Month View 	1 	25920 	2678400 	
1 min - 5 Minute Average - 1 Month View 	5 	25920 	2678400 	
(10 sec) - 5 Minute Average - 1 Month View 	30 	25920 	2678400 	
1 min - 30 Minute Average - 1 Year View 	30 	35040 	31536000 	
(10 Sec) - 30 Minute Average - 1 Year View 	180 	35040 	31536000 	
5 min - 30 Minute Average - 1 Year View 	6 	35040 	31536000 	
(10 sec) - 2 Hour Average - 3 Year View 	720 	13140 	94608000 	
1 min - 2 Hour Average - 3 Year View 	120 	13140 	94608000 	
5 min - 2 Hour Average - 3 Year View 	24 	13140 	94608000

So, what does all that mean? The RRA configurations can take some time to wrap your head around, but hopefully this post can provide some clarity.

NAME: This can be anything, but they should be clear. I've added a prefix for each of the intervals I want to graph at. This will come in handy when we create / configure our new Data Templates.

Step: This is defined as "How many data points are needed to put data into the RRA." Thus, how many times should a device be polled at a given interval before it's output is averaged and put into the RRA as a VALUE. If you're polling once per minute, but you define your Step value as 5, you're going to get 5 minute averages and will not get the granularity you are looking for. Thus, for each of my custom RRAs 24-Hour View, I've defined Step as 1. This will make the 10 second graphs graph once every 10 seconds, the 1 minute graphs every 1 minute and the 5 minute graphs every 5 minutes. Easy, right?

Moving along to the week view, I've defined the Step value according to the average I want. For the 10 second graphs, I want my week view to be averaged on 1 minute values. So, I set my Step value to 6 (6s * 10 = 60s). I want consistency for my 1 minute graphs, so I want that week average to be 1 minute as well. Thus, I set the Step Value for the 1 minute graphs to 1. The 5 minute graph can only display at 5 minute intervals, so it's week view will also be 5 minutes and so the Step value will be 1. Still with me? Good!

The Month View: For the 10 second graphs, I'll be averaging values over 5 minutes. So, what should my step be? Well, we need 6 polls per minute over 5 minutes to get the average, so, that should be set to 30. The 1 minute graphs will also be averaged over 5 minutes for the month view. So, we have 1 poll per minute for 5 minutes so the Step should be 5. For the 5 minute graphs, our step remains 1 (1 poll per 5 minutes).

The Year View: My year view is a 30 minute average. Let's figure out the Step values for each of the Polling Intervals.
10sec: 6 (polls per minute) * 30 (desired average) = 180.
1min: 1 (poll per minute) * 30 (desired average) = 30
5min: 6 polls per 30 minutes, so, 30/5 = 6

I want to make sure that I have good data for a LONG time, so I created a 3 year RRA as well. I think with the above the STEP values should be self-explanatory, so, we'll move on.

Rows: The Rows Value defines the number of Steps that each RRA should hold. This defines "the width of the rolling window", or in other words, the amount of time old data will be kept in the RRA before it is dropped off. As mentioned above, my settings may well differ from what would work for you. The settings above define the following for each polling interval:

Code: Select all

Interval / View / Rows / Storage Duration
10s / 24h / 25920 / 72h
1m / 24h / 4320 / 72h
5m / 24h / 864 / 72h

10s / 7d / 44640 / 31d
1m / 7d / 44640 / 31d
5m / 7d / 8928 / 31d

10s / 1mo / 25920 / 90d
1m / 1mo / 25920 / 90d
5m / 1mo / 25920 / 90d

10s / 1y / 35040 / 3y
1m / 1y / 35040 / 3y
5m / 1y / 35040 / 3y

I'll break down the 24 hour view and leave the rest of the math to you guys. For the 10 second poller 24 hour view, I want to keep 3 days worth of data at 10s granularity. So, the question is "How many steps are there in 72 hours?" The answer is ROWS. So, we have 6 polls per minute, 60 minutes per hour times 72 hours, or (6 * 60 * 72), or 25920. For the 1 minute poller 24 hour view I want 3 days worth of data at 1m granularity, so that equation is going to be (1 * 60 * 72) or 4320. For the 5 minute poller 24 hour view I want 3 days worth of data at 5m granularity so (12 * 72) = 864 (12 polls per hour times 72 hours).

Timespan: Sweet, the last one! Timespan is the easy one. This simply defines what each of the graphs look like. You want your 24 hour view to show exactly 24 hours, right? And your year view to show all 12 months? So, all you need to do is determine the number of seconds for each of these. In case you're scared of math, I put them for you below:

Code: Select all

24 hours = 86400s
7 days = 604800s
31 days = 2678400s
1 year = 31536000s
3 years = 94608000s

So, that's how you create custom RRAs, but how do you put data into them? Well, now we need to (re)define some data templates and tell them to populate these RRAs. I recommend doing the following on a NEW installation of Cacti 0.8.7, in a development environment. Once again Use at your own risk.

First of all I should say, I wouldn't use anything more granular than 1 minute for SNMP polling intervals. You can, but I wouldn't recommend it. SNMP uses UDP, and can be impacted by network latency, high resource utilization on the polling and/or destination hosts, authentication in SNMP v2 and higher, etc. This can easily result in gaps in your graphs (which will impact all of your averages and will take away from the granularity you're seeking). In addition, SNMP queries use system CPU on the target hosts, so if you're hitting your hosts too often, you may be adversely affecting the performance of the hosts you want to monitor. My 10 second graphs are all for custom NRPE counters and internal metrics of a custom application we run in my environment. Thus, the example below will be for configuring 1 minute Data Templates for SNMP queries.

The most commonly polled device would be Network Interface Traffic (bits/s) so this is the data template we're going to modify. Go to Console -> Data Templates. Find the Interface - Traffic Data Template. Check the box and select Duplicate at the bottom, and click "Go". Name the Data Template something like "Interface - Traffic 5 minute" or "Interface - Traffic (Default/Backup)". Now, go into the Interface - Traffic Template. If you like, you can change the name to add the suffix "1 minute", but that's up to you. (I like modifying the Data Template that shipped with Cacti so it's the default Interface Data Template when I'm creating new hosts. If you'd rather, you can duplicate the original Data Template and modify the dupe. This requires duplicating the associated Graph Template and changing the Associated Data Templates). Choose the Associated RRAs (Hold Ctrl to select multiple RRAs). Since I created all of my Custom RRAs with prefixes, I select all the RRAs that start with "1 min - ". Since it's a 1 minute graph, set the STEP value to 60 (Note: This "Step" differs from the RRA "Step" discussed above. This step is the frequency, in seconds, that data can be added to the Data Template). For the traffic_in Data Source, change the Heartbeat to 120 (this should always be double your Data Source STEP value). Click Save. Now, go back into the Interface - Traffic (1 min) Data Template and click the traffic_out Data Source. Change the Heartbeat there as well. This is a very important step! You must change the Heartbeat setting for ALL data sources! Click Save again, and you're done.

Now, you can create a new Interface Graph for any given host and get 1 minute values!

One last thing to note -- on my 0.8.7 installation, after changing the Data Template to 1 minute, I needed to "Relink" or "remind" cacti to associate the Interface - bits/sec Graph Template with the modified Interface Data Template. Otherwise, I would get NAN. I had created some graphs for interface before, and changed around my RRA settings, so this may not be necessary for you, but here's what I did to fix it.
* Clear SNMP Poller Cache before creating new graphs
* Go into Console -> Graph Templates and edit the Interface - bits/sec. Click each of the Graph Items, and simply click Save.
* Delete any existing NAN graphs for interface, and recreate them for the device.

I hope this clears up some of the confusion around getting Cacti to poll at 1 minute intervals. Here's a quick recap:

*Create new RRAs to store data at 1 minute (and lower) intervals
*Create new / Edit existing Data Templates to hold data at 1 minute (and lower) intervals
*Relink existing Graph Templates with the modified Data Template
*When in Doubt, Clear the Poller Cache, recreate the graphs, look at the log file, rrdtool dump the file, post to the forum!

Also, send the developers some love (and / or money). As mentioned in the 0.8.7 release notes, they all have Day Jobs and contribute to Cacti because they WANT TO. I'll follow this thread and do what I can to assist with anyone's issues. Please try to provide as much information as possible!

zeki · Post by **zeki** » Fri Nov 09, 2007 9:32 am

great post. lots of good info

torstentfk · Post by **torstentfk** » Fri Nov 09, 2007 10:05 am

Hi,

If i do not change the poller to 1 min,-poller runs now every 5 minutes- how could cacti collect data in 1 minute intervalls?

Torsten

tekbot · Post by **tekbot** » Fri Nov 09, 2007 11:13 am

Read my post above. Basically, the steps you need to follow are these:

* Keep cron entry for cacti at 5 minutes
* Configure new RRAs to store 1 minute data
* Configure Data Sources to collect data in 1 minute intervals
* Associate the Graph Templates to the new Data Sources
* Set Poller to 1 minute under Console -> Settings -> Poller
* Create new graphs for devices using these Graph Templates

If I have some time, I can create a walkthrough specific to 0.8.7, but if you read my post, you should be able to do this on your own.

I may also export my Data Templates, RRAs and Graph Templates, again, if I have the time.

tekbot

zeki · Post by **zeki** » Wed Nov 14, 2007 4:54 pm

tekbot's post should be a sticky 1

schef4711 · Post by **schef4711** » Sat Dec 01, 2007 2:08 pm

Hello,

I have installed a fresh Cacti 0.8.7a and changed the templates as described from tekbot (thx to him for the detailed description how to do it)

In general it will work fine but with graps in the graphs but this should not be a problem of the configuration because the graphs will show different information in fact (I think) of the load of the traffic.

So one example on the same Cisco / same interface but only two graphs (bits and bytes) :

The graph with "bits" will show less of data information each minute as shown on the graph with "bytes". So normaly if there is a grap in the graph it should be in both the same - or I'm not right with this ??

Maybe anyone had the same problem and have resolved it. My cron will run every 5 minutes and in Cacti the cron is set to 5 minutes and the poller to 1 minute.

In the cacti.log I don't get any error. Only in my system log I will get the following error but this error seams a general problem with the rrdtool :

Dec 1 19:03:02 storage01 rrdtool[31485]: segfault at 00002aae353dc6e0 rip 00002aae32952799 rsp 00007fff7839aae0 error 6
Dec 1 19:04:02 storage01 rrdtool[31521]: segfault at 00002b10267336e0 rip 00002b1023ca9799 rsp 00007fff87045790 error 6

On the same machine I will poll also other Cisco Router and on each I will get different graphs. Some with like no gaps and some with many gaps as show above. So it seams to be a problem with the poller.

Here my system configuration :
2 x AMD Opteron 848 2.2Ghz
4 GB RAM, 1TB HDD (SATA on Raid6)
Gentoo 2007.0
Kernel 2.6.22-r5
rrdtool 1.2.23-r1
cacti 0.8.7a

I will be happy for any help.

bye alex

schef4711 · Post by **schef4711** » Sat Dec 01, 2007 2:21 pm

Hello again,

the same problem exists for Cisco CPU Usage but this templates I don't changed to 1 min. It is the original 5 min average !!!!!

So maybe this should a bug in the new cacti version ??

bye alex

tekbot · Post by **tekbot** » Sat Dec 01, 2007 2:55 pm

Hi Schef,
I'm curious, what do you have your Poller Setting set to under Settings -> Poller? Also, how often is your crontab set to run?

They fixed the scheduling in this version of cacti, so devices will be polled according to the Step value configured in your Data Template. Here are the things I would check:
* Make sure your Crontab is setup to run every 5 minutes (regarless of how often you want to poll your various devices)
* Check your poller interval in the Web Interface under Settings -> Poller.
* Double Check your Data Templates. If you want to poll your Cisco's once every minute, find your In/Out bits (95th Percentile) Data Template and make sure that only the 1 minute RRAs are selected, the Step is set to 60 and the Heartbeat for each of the Data Sources is set to 120.

When I initially started working with 0.8.7 (and 0.8.7a), I configured my crontab to run every 1 minute and set the same values in the web interface. This got me gaps very similar to yours, so, it's imperative the crontab remains set to 5 minutes.

If everything is configured properly as described above, please post:
* your crontab configuration
* A screenshot of your Settings -> Poller page
* A screenshot of the Data Template(s) you're having troubles with (In/Out bits - 95th percentile, Cisco CPU, whichever).

I'll see if I can help.

Thanks,
tekbot.

schef4711 · Post by **schef4711** » Sat Dec 01, 2007 7:44 pm

Hi,

here is my crontab (only the part of cacti) :

Code: Select all

*/5 * * * *     root    php /var/www/localhost/htdocs/cacti087a/poller.php >>/var/log/cacti/cactipoller.log 2>&1

The Cisco CPU-Stats (Hourly 1min and daily 5min) from default :

Attached you will have all other print screens. The heartbeat and Step should be on all in right place.

Maybe there is something wrong in the RRAs because I don't know if the X-Files Factor should be right ?? The Daily (24h) has the following entries :

Code: Select all

Consolidation Functions (AVG + MAX)
X-Files Factor 0.5
Steps 1
Rows 4320
Timespan 86400

All should be a little bit curious because I don't have changed anything on the default templates like the Cisco CPU. In that place every entry is on default from the Cacti installation.

bye alex

schef4711 · Post by **schef4711** » Sat Dec 01, 2007 7:48 pm

Hi,

I forgot to tell : I had never configured the crontab to 1 minute. So it should not be a problem with a cache.

bye alex

schef4711 · Post by **schef4711** » Fri Dec 07, 2007 12:23 pm

Hello,

I found something in the cacti.log :

Code: Select all

12/07/2007 05:16:03 PM - CMDPHP: Poller[0] Host[5] DS[157] SNMP: v2: xxx.xx.xx.xxx, dsname: traffic_out, oid: .1.3.6.1.2.1.2.2.1.16.49, output: 3559650959
12/07/2007 05:16:03 PM - CMDPHP: Poller[0] Host[5] DS[151] SNMP: v2: xxx.xx.xx.xxx, dsname: traffic_in, oid: .1.3.6.1.2.1.2.2.1.10.38, output: 1356217592
12/07/2007 05:16:03 PM - CMDPHP: Poller[0] Host[5] DS[156] SNMP: v2: xxx.xx.xx.xxx, dsname: traffic_in, oid: .1.3.6.1.2.1.2.2.1.10.41, output: 0
12/07/2007 05:16:03 PM - CMDPHP: Poller[0] Host[5] DS[159] SNMP: v2: xxx.xx.xx.xxx, dsname: traffic_out, oid: .1.3.6.1.2.1.2.2.1.16.41, output: 115908344
12/07/2007 05:16:03 PM - CMDPHP: Poller[0] Host[5] DS[159] SNMP: v2: xxx.xx.xx.xxx, dsname: traffic_in, oid: .1.3.6.1.2.1.2.2.1.10.41, output: 0
12/07/2007 05:16:03 PM - CMDPHP: Poller[0] Host[5] DS[156] SNMP: v2: xxx.xx.xx.xxx, dsname: traffic_out, oid: .1.3.6.1.2.1.2.2.1.16.41, output: 115908344
12/07/2007 05:16:03 PM - CMDPHP: Poller[0] Host[5] DS[158] SNMP: v2: xxx.xx.xx.xxx, dsname: traffic_in, oid: .1.3.6.1.2.1.2.2.1.10.50, output: 908159914

Maybe output: 0 seems to be that there are no data for that host for the poller and this make a GAP ?? Maybe this will be on hosts which have very low traffic or only sometimes a request.

So on some graphs it will be but I have hosts where I know surely that they will not have any minute without a request but I got 0. So it is a little bit crazy with the gaps on a new installation.

bye alex[/quote]

tekbot · Post by **tekbot** » Fri Dec 07, 2007 12:30 pm

I don't think that's it, Schef, but zoom in on the gaps to verify. If you see 0, it's fine, if you see NaN (which I expect) it's a bug with the poller / Data Template / Configuration.

Since these are interface graphs, they would be counter types (always incrementing), so, if no traffic is registered between polling intervals and the value remains the same, cacti will plot the last value it had (in the case of a 0 value, it will always print 0).

There are a few things that looked amiss on your Data Template that I'll comment on shortly.

Sorry about the delay in getting back to you, it's been a very busy week for me.

Do me one other favor when you have a chance: go to your cacti.log directory and run the following for 5-10 polling intervals.

Code: Select all

tail -f cacti.log | grep xxx.xx.xx.xxx

where xxx.xx.xx.xxx is the IP address of one of the routers you're not getting data back for. Then, post that output to this topic.

Thanks,
tekbot

Cacti

[HOWTO] 0.8.7 and 1 minute polling

RRA and Data Template Settings for 0.8.7

Who is online