[HOWTO] 0.8.7 and 1 minute polling
Moderators: Developers, Moderators
Settings Amiss (as mentioned above):
Your Cisco CPU usage Data Template appears to be the default, 5 minute averages all around with a Step of 300 and a Heartbeat (presumably) of 600. This is fine, but the fact that you're not getting data on this graph in addition to your interface graphs says to me that the issue is not in the fact that you created NEW RRAs as described in my first post on this thread.
So, the issue lies elsewhere. A few questions I'd like you to answer:
Are you using any plugins (namely boost)?
Are you using the 0.8.7a Spine binary or cmd.php for your poller?
Why are you running the poller as root? (this shouldn't really matter, but it's best practice to run the poller as a cacti user, and make sure the log file and RRA directory are owned by this cacti user).
How many devices in your infrastructure are you polling on this system? A lot? A few? Routers only? Windows / Linux boxes, switches, etc...?
Is this the only cacti installation on this server?
Please answer these questions and post the output from the tail | grep I requested previously and we should have a better idea of what's going on.
Thanks,
tekbot
Your Cisco CPU usage Data Template appears to be the default, 5 minute averages all around with a Step of 300 and a Heartbeat (presumably) of 600. This is fine, but the fact that you're not getting data on this graph in addition to your interface graphs says to me that the issue is not in the fact that you created NEW RRAs as described in my first post on this thread.
So, the issue lies elsewhere. A few questions I'd like you to answer:
Are you using any plugins (namely boost)?
Are you using the 0.8.7a Spine binary or cmd.php for your poller?
Why are you running the poller as root? (this shouldn't really matter, but it's best practice to run the poller as a cacti user, and make sure the log file and RRA directory are owned by this cacti user).
How many devices in your infrastructure are you polling on this system? A lot? A few? Routers only? Windows / Linux boxes, switches, etc...?
Is this the only cacti installation on this server?
Please answer these questions and post the output from the tail | grep I requested previously and we should have a better idea of what's going on.
Thanks,
tekbot
Hi,
I will be back soon
thx a lot
alex
don't hurry up - delay will not be a problem because at the moment I will use the 1min avg only in "test mode" So maybe other work is more importanttekbot wrote:There are a few things that looked amiss on your Data Template that I'll comment on shortly. Sorry about the delay in getting back to you, it's been a very busy week for me.
You're right - I will see "NaN" on Current/Averrage/Maximum and 0 Bytes (within the Bytes graph) or 0 mbit in+out (within the 95% graph) when I look inside the gaps.tekbot wrote:I don't think that's it, Schef, but zoom in on the gaps to verify. If you see 0, it's fine, if you see NaN (which I expect) it's a bug with the poller / Data Template / Configuration.
I can do this but it need a little bit of time to find out the right graphs because at the moment each router/switch has many graphs. I will create a new device with only 2 interfaces where I will have the problem to reduce the output.tekbot wrote: Do me one other favor when you have a chance: go to your cacti.log directory and run the following for 5-10 polling intervals.
I will be back soon
thx a lot
alex
surely, maybe there is another problem in fact off no changing nothing to that data template but the best question is "where we should search about it". So I thinking more and more that there is a bug in the poller.tekbot wrote:Settings Amiss (as mentioned above):
Your Cisco CPU usage Data Template appears to be the default, 5 minute averages all around with a Step of 300 and a Heartbeat (presumably) of 600. This is fine, but the fact that you're not getting data on this graph in addition to your interface graphs says to me that the issue is not in the fact that you created NEW RRAs as described in my first post on this thread. So, the issue lies elsewhere.
no, it is a fresh installation on a new server without any installation before. So there was no upgrading, adding or changing of any file of cacti.tekbot wrote: A few questions I'd like you to answer:
Are you using any plugins (namely boost)?
Is this the only cacti installation on this server?
I use cmd.php as my poller. Spine isn't installed on that machine. Surely root shouldn't be but I can change it - this isn't the problem - but I don't think that will have a newer effecttekbot wrote: Are you using the 0.8.7a Spine binary or cmd.php for your poller?
Why are you running the poller as root? (this shouldn't really matter, but it's best practice to run the poller as a cacti user, and make sure the log file and RRA directory are owned by this cacti user).
At the moment the "localhost" is disabled, I have two cisco switch (one with 63 and the other one has 35 graphs), two cisco router (23 and 29 graphs) and 3 zyxel modem (each has 4 graphs). So in total 4 Cisco's with 150 graphs and 3 Modem with 12 graphs Like nothing for a Dual Opteron 848 with 4GB and 1TB storagetekbot wrote: How many devices in your infrastructure are you polling on this system? A lot? A few? Routers only? Windows / Linux boxes, switches, etc...?
the "tail -f" file http://www.buenosair.es/mrtg/20071207cactilog.txt with the last 13 polling outputs . The DS[71] is the 5min CPU graph, the DS[74] (95% graph) and the DS[88] (Bytes Total graph) are the graphs as I posted the images before with the gaps.tekbot wrote: Please answer these questions and post the output from the tail | grep I requested previously and we should have a better idea of what's going on.
If you need some other output please let me know.
thx a lot for helping
alex
Last edited by schef4711 on Fri Dec 07, 2007 2:15 pm, edited 1 time in total.
Here also the whole information of the poller output (cacti.log) for my graphs where I will have gaps (not only that ones are affected) :
DS[71] (5min avg CPU)http://www.buenosair.es/mrtg/20071207cactiCISCOCPU.txt(270KB)
DS[74] (95% graph)http://www.buenosair.es/mrtg/20071207cactiCISCODS74.txt(2.9MB)
DS[88] (Bytes graph)http://www.buenosair.es/mrtg/20071207cactiCISCODS88.txt(2.9MB)
This "grep's" are since that time I had installed cacti and had configured the graph.
bye alex
DS[71] (5min avg CPU)http://www.buenosair.es/mrtg/20071207cactiCISCOCPU.txt(270KB)
DS[74] (95% graph)http://www.buenosair.es/mrtg/20071207cactiCISCODS74.txt(2.9MB)
DS[88] (Bytes graph)http://www.buenosair.es/mrtg/20071207cactiCISCODS88.txt(2.9MB)
This "grep's" are since that time I had installed cacti and had configured the graph.
bye alex
Hi tekbot
After reading your guide note, i was quite interested to update my cacti from 6j to 7a (waiting for stable ver.) and i am looking to change the polling time to less than 5minutes (maybe 3 or 4 minutes).
But looking at your post does give me a thinking cap to see whether going for 1min polling.
It would much appriciated if tekbot could offer us some screen capture of your RRA, console setup, adjustments made in data source, poller and etc.
Hope you have time for this as I believe many would like to see how you do it.
Appreciate your help on this. Please advice. thanks again.
After reading your guide note, i was quite interested to update my cacti from 6j to 7a (waiting for stable ver.) and i am looking to change the polling time to less than 5minutes (maybe 3 or 4 minutes).
But looking at your post does give me a thinking cap to see whether going for 1min polling.
It would much appriciated if tekbot could offer us some screen capture of your RRA, console setup, adjustments made in data source, poller and etc.
Hope you have time for this as I believe many would like to see how you do it.
Appreciate your help on this. Please advice. thanks again.
KeyBoarD Is MightieR ThaN ThE sWorD, iF onLy ConNecTed tO tHe InTernET..
Sorry about the delay in getting back to you guys. Here's a handful of screenshots. The first is of my custom RRA settings. The next is of a modified CPU Data Template. I threw in one of my 10second graphs as well to show the granularity -- this a 12 hour view of 2 10 second data sources with a cdef that calculates the Net Gain and Loss. For more detailed information, refer to my earlier posts in this thread.
Hope all this helps!
Hope all this helps!
- Attachments
-
- 12 hour view of a 10s graph. This graph includes 2 10 second data sources, and a CDEF that calculates the Net Gain / Loss.
- 03 - Client Connect.png (49.89 KiB) Viewed 23584 times
-
- Modified Data Template for standard CPU Data Source. Note the selected RRAs, Step and Heartbeat values.
- 02 - CPU Data Template (1 min).png (90.76 KiB) Viewed 23584 times
-
- Custom RRA Settings for storing 10s, 1m, and 5m graph data as per my first post in this thread.
- 01 - RRA Settings.png (61 KiB) Viewed 23584 times
-
- Posts: 32
- Joined: Fri Jan 19, 2007 2:11 pm
The part I don't understand about the 1 minute polling is, if the poller is scheduled on the cron to run every 5 minutes, inbetween those intervals, how is data then gathered?? In other words, if the poller isn't gathering data every 60 seconds, what is?? Where do the other 4 numbers sampled come from??
soloslinger
soloslinger
This is confusing me as well...soloslinger wrote:The part I don't understand about the 1 minute polling is, if the poller is scheduled on the cron to run every 5 minutes, inbetween those intervals, how is data then gathered?? In other words, if the poller isn't gathering data every 60 seconds, what is?? Where do the other 4 numbers sampled come from??
soloslinger
From what I've gathered, for this to work, you need the following:
- The poller.php entry in the crontab set to */5 (every 5 minutes).
- [Settings -> Poller -> Cron Interval] set to "Every 5 Minutes".
- [Settings -> Poller -> Poller Interval] set to "Every 1 Minute".
If tek, or someone else, could confirm this for me, I'd greatly appreciate it.
- TheWitness
- Developer
- Posts: 17007
- Joined: Tue May 14, 2002 5:08 pm
- Location: MI, USA
- Contact:
Quite simply, if you set the cron interval to 5 minutes and the poller interval to 1 minute, the poller will run 5 times and exit.
If you set the cron interval to 5 minutes and the poller interval to 10 seconds, the poller will run 30 times and exit.
If you ever change a poller interval for an existing data source, you have to delete the corresponding rrdfiles (sorry, it's rrdtool).
If you change a poller interval for a data template, you should likely repopulate your poller cache to re-distribute the polling of data sources.
If you were previously polling at 1 minute with a 5 minute RRD to compensate for not having a 64bit counter available, then you have a problem as that was not considered as a part of the design. What I mean by that is that if you have 32bit counters and you poll a device 5 times in 5 minutes to allow RRDtool to store the average of those 5 samples, the design of the poller interval did not take that into account. I suspect that is a corner case as most "high bandwidth" devices are "modern" (net-snmp 5.2++) and otherwise are network electronics which typically support snmpv2/3 and 64bit counters.
TheWitness
If you set the cron interval to 5 minutes and the poller interval to 10 seconds, the poller will run 30 times and exit.
If you ever change a poller interval for an existing data source, you have to delete the corresponding rrdfiles (sorry, it's rrdtool).
If you change a poller interval for a data template, you should likely repopulate your poller cache to re-distribute the polling of data sources.
If you were previously polling at 1 minute with a 5 minute RRD to compensate for not having a 64bit counter available, then you have a problem as that was not considered as a part of the design. What I mean by that is that if you have 32bit counters and you poll a device 5 times in 5 minutes to allow RRDtool to store the average of those 5 samples, the design of the poller interval did not take that into account. I suspect that is a corner case as most "high bandwidth" devices are "modern" (net-snmp 5.2++) and otherwise are network electronics which typically support snmpv2/3 and 64bit counters.
TheWitness
True understanding begins only when we realize how little we truly understand...
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Hi tekbot,
Just a quick post to say thanks, this has proved to be a really useful thread to understanding how the 1 minute poller works.
The last post from TheWitness also really made it click for me.
This 1 minute polling is already giving me greater visibility of my network. See picture for how easy it is to miss some short-lived traffic spikes.
To The Forum Admins - I know it's already a Sticky, but could/should tekbot's post be moved or linked to the 'How To' Section of the Forum?
(I'm also sure that we're all eagerly awaiting the release of version 0.8.8 too)
Thanks to all the Cacti Team.
Graham.
Just a quick post to say thanks, this has proved to be a really useful thread to understanding how the 1 minute poller works.
The last post from TheWitness also really made it click for me.
This 1 minute polling is already giving me greater visibility of my network. See picture for how easy it is to miss some short-lived traffic spikes.
To The Forum Admins - I know it's already a Sticky, but could/should tekbot's post be moved or linked to the 'How To' Section of the Forum?
(I'm also sure that we're all eagerly awaiting the release of version 0.8.8 too)
Thanks to all the Cacti Team.
Graham.
- Attachments
-
- 1-min-test.png (69.98 KiB) Viewed 22555 times
I have my cron and poller intervals set to 1 minute as well as my crontab. I have also created most data sources with interval 60 and heartbeat 120.
Still unclear after reading this what I am actually missing? Are the rrds being updated every minute for five minutes with the same number?
One reason I do this is so that weather maps are recreated every minute, which works a treat - but confused as I am pretty sure the numbers change every minute.
Still unclear after reading this what I am actually missing? Are the rrds being updated every minute for five minutes with the same number?
One reason I do this is so that weather maps are recreated every minute, which works a treat - but confused as I am pretty sure the numbers change every minute.
Not have to recreate graphs
Hi all,
Is there a way to avoid recreating 1500+ graphs that I have if I want to have 1 min resolution? Of course old data will stay at old resolution, but that new is added at 1min res?
Recreating them all by hand would be such a waste.
Thanks,
Is there a way to avoid recreating 1500+ graphs that I have if I want to have 1 min resolution? Of course old data will stay at old resolution, but that new is added at 1min res?
Recreating them all by hand would be such a waste.
Thanks,
Completely confused
Please forgive my ignorance. I have read this post over and over trying to understand how to get the one minute polling to work and reflected in my graphs. I understand leaving cron to run every five minutes and to set the poller interval at one minute. This basically starts the poller every five minutes, polls the devices once a minute for five minutes, and then the poller process ends. My understanding ends here. In tekbot's long post regarding the custom rra's and template's, he states that step is defined as how many polls is required to average the data and enter it into the rrd. So, to get the 10 second granularity that he is stating, his polling interval has to be set at 10 seconds, right? For his 1 minute average, his step is defined as 1, shouldn't that be 6 (6 polls x 1 minute)? For his five minute average, his step is still defined as 1, shouldn't that be 30 (6 polls x 5 minutes)?
Who is online
Users browsing this forum: No registered users and 0 guests