increase polling frequency to 1 minute (maybe more)

Post support questions that relate to the Windows 2003/2000/XP operating systems.

Moderators: Developers, Moderators

fstruman
Posts: 2
Joined: Mon Nov 07, 2005 1:03 pm

increase polling frequency to 1 minute (maybe more)

Post by fstruman »

Hello,

I just finished the installation of cacti. It is up and running! Now I need to increase the polling frequency of poller.php to at least 1 interrogation per minute. The reason is I need to monitor the packet dropping on our connection.

Is it supported ? How can I configure this in Cacti ?

Thanks!
Francois
rkramer
Cacti User
Posts: 54
Joined: Fri Jun 03, 2005 12:25 pm

Post by rkramer »

no offense, but it had to have taken longer to type that message then to click the search button above...

http://forums.cacti.net/viewtopic.php?t ... cy++minute
fstruman
Posts: 2
Joined: Mon Nov 07, 2005 1:03 pm

Post by fstruman »

Thanks for your answer and post reference (I'll search more through the posts next time)
Unfortunately Iam amongst the 0.8.6 cacti users, which means the patch does not work for me.

Any idea if or when this feature will be supported in future cacti releases ? or would you advise to downgrade to 0.8.5 so I can apply the patch ?
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: increase polling frequency to 1 minute (maybe more)

Post by gandalf »

fstruman wrote:Hello,

I just finished the installation of cacti. It is up and running! Now I need to increase the polling frequency of poller.php to at least 1 interrogation per minute. The reason is I need to monitor the packet dropping on our connection.

Is it supported ? How can I configure this in Cacti ?

Thanks!
Francois
I do not understand. Why do you think sampling at 60 sec. would be better to monitor drops/errors? These are COUNTER values in the standard MIBs. So even if you read those values at 300 sec interval, you will see the increase of those counters. And if you're looking for absolute increase per interval instead ob increase/sec, you'll use the Make per 5 Minute CDEF.
Reinhard
User avatar
rony
Developer/Forum Admin
Posts: 6022
Joined: Mon Nov 17, 2003 6:35 pm
Location: Michigan, USA
Contact:

Post by rony »

Only practical reason I have seen to do 1 minute intervals is a device with Gigabit interfaces but only 32 bit counter running a lot of traffic.
[size=117][i][b]Tony Roman[/b][/i][/size]
[size=84][i]Experience is what causes a person to make new mistakes instead of old ones.[/i][/size]
[size=84][i]There are only 3 way to complete a project: Good, Fast or Cheap, pick two.[/i][/size]
[size=84][i]With age comes wisdom, what you choose to do with it determines whether or not you are wise.[/i][/size]
kghammond
Posts: 9
Joined: Wed Nov 16, 2005 5:36 pm
Contact:

Post by kghammond »

I am looking for one minute or faster intervals as well. Alike other posts on both threads, it seems there is a solution for the previous version but not the current version.

Those that know how to alter Cacti to a one minute interval do not seem to be posting any instructions so that others can try to accomplish the same goal.

And here the last few posts just want to criticize those that want faster polling intervals.

I will give you a few reasons for faster polling intervals. We need to pickup spikes in any resource utilization. Why are spikes bad? Spikes make a resource perform slowly for a short period of time, thus given the customers (internal or external) a bad experience. Bad experiences make for unhappy customers.

In our case this can be a spike in bandwidth on a router, or it could be a spike in a CPU or it could be a spike in concurrent connections to a device, etc.

So, anyone have any docs or insight on how to modify cacti to one minute?

Back to searching some more...
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Post by gandalf »

No ontention to offend anyone.
But myself, I have had hard times to discuss with people interpreting the results with/without aggregation and with/without using MAX values and the like. So I want to ask anyone whether is is really the ultimate solution to switch to 1min polling.
As I have NOT done this up to the moment, I can only give some hints which need working out a bit more. But let's start.
POLLING
Obviously, you would need to change the polling interval in crontab to 1 min. Be sure, the poller finishes within this time. Take care, cause one of the later steps (modifyiung the rra's) will result in longer poller runtime, depending on amount of rrds and speed of your storage subsystem.
RRAs
Change the STEP values from 300 to 60 (interval between polling cycles). Increase the numer of data points 5 fold (at least) to store data at mximum precision level for the same amount of time as now (about 2 days). Increase to higher values, if you want to store precise data for more than 2 days (this will increase the site of the rrd files accordingly.
Change the HEARTBEAT value from 600 to 120 (or the like). See rrdtool doc for more details http://people.ee.ethz.ch/~oetiker/webto ... te.en.html
Chnage the TIMESPAN values. They are NOT rrdtool values but define the timespan used to display the detail graphs. ATTENTION: If you do not change this, cacti will still display e.g. the daily graph with e.g. 500 pixels (x-axis) but will have to display the 5fold number of datapoints than usual. This will result in a "graphical aggregation" of data, because the are not as many pixels as needed to display each 1min value for a whole day. Of course, the precision would rise when zooming into the graph (this is one of the interesting aspects for unexperienced users, as suddenly peaks will be shown that where not to be seen formerly).
I do not remember exactly, which of those values sits where (RRA, Global Settings, Data Template, ...) as I do not have a cacti system at hand at the moment
ALL cdefs that do time calculations (e.g Make per Minute) will have to be revised.

Referring to the authors, this is not supported at the moment.

But I hope, this helps
Reinhard
matguy
Posts: 23
Joined: Tue Nov 29, 2005 7:24 pm
Location: Seattle, WA

Post by matguy »

Any updates on this?

Basically, I want to do the opposite and increase the time between polling. I've had my polling set to 5 minutes since my initial install and am constantly running in to a wall when adding new scripts, usually vb-wmi connections. Right now I'm sitting at over 6 minutes for polling time and I have more to add. As much as it'd be nice to consolidate some of the scripts together to be more efficient, I use a few scripts on many servers and would rather re-use them and just put in arguments to the script rather than hard code anything in to the scripts and have to edit them directly all the time when I add more things to check. The thing is I really don't need them to run every 5 minutes, 10 is fine. What all do I need to do to be able to make Cacti happy with changing the scheduled poller process to 10 minutes? Most of what I'm doing is checking file sizes on our Exchange Databases. While there is plenty of other stuff I'm checking, a 10 minute interval is plenty.
matguy
Posts: 23
Joined: Tue Nov 29, 2005 7:24 pm
Location: Seattle, WA

Post by matguy »

Oh, well here's the problem with just doing a search for this issue, mine isn't unix specific, if anything it's windows. Sorry.
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Post by gandalf »

Moved to Win32 Specific
But I'm wondering why cacti's polling should last for 6 minutes. There are big installations out there with thousands of data sources and rrds...
If you don't need such frequently polling, almost the same holds as for increasing polling frequency. Of course, the drawbacks will not hurt :lol:
Just let me count some thinks to keep in mind:
- changing polling frequency in crontab (or the windows counterpart of that)
- changing rra definitions: this depends on the time each rra should span
- changing step size of each data template
- changing heartbeat of each data template
- changing rra/step/heartbeat for each existing rrd you want to keep but adjusted to the new definitions (rrdtool tune will help)
Perhaps it will be necessary to change Graph Definitions (timespan) as well
Reinhard
eddievenus
Cacti User
Posts: 60
Joined: Mon Jul 18, 2005 7:01 pm

Post by eddievenus »

I never noticed this post before, but I have been doing 1 minute polling on at least a dozen Cacti installs, most on CactiEZ which means linux, but some on Windows as well.

It is very easy, but it takes some time to get it all moved over from 5 min to 1 min.

In this post I will tell you how, in the following one I will explain why 1 min is vastly superior to 5 min, while monitoring anything.

Step 1. If polling is running, stop it and archive all your graphs, clear them out so new ones will be made. If this is a new install do not make any graphs before you begin. They will not work after changing the time period

Step 2. Change the polling frequency to 1 min. In CentOS (Redhat) you change the crontab. In CactiEZ the crontab for the poller runs under the apache user. Simply enter 'crontab -e -u apache' and you will be editing the apache crontab. Find the line that mentions poller and change it so that it has 5 *'s and no numbers at the begining. This will make it poll every minute. Hit escape and type :n to exit the vi editor.

Under windows you only need to find the task you made, and enter the advanced properties to have it run every minute. Easy enough so far, right?

Step 3. Make a new RRA for the 1 minute polling. I called mine Hourly (1 minute Average), but you can call it whatever you want. This is done in cacti's web interface, under Data Sources, the RRAs. Click on ADD, then use this info
Consolidation Functions - Select them all
X-Files Factor - 0.5
Steps - 1
Rows - 525600
Timespan - 3600

This makes 1 years worth of data at 1 minute resolution. You may wish to change yours if RRD file sizes are too big. My file sizes are usually 16-32 megs in size. More than manageable I think, considering that even when using old hardware I have 30+ gigs to work with. However your environment may be different.

Now you need to change the other RRAs too. Like this.
Name Steps Rows Timespan
Hourly (1 Minute Average) 1 525600 3600
Daily (5 Minute Average) 1 600 86400
Weekly (30 Minute Average) 6 700 604800
Monthly (2 Hour Average) 24 775 2678400
Yearly (1 Day Average) 288 797 33053184

This is what I use, and it has worked for me for the last 8 months or so. I have resolution down to 1 minute (when I zoom in) on every graph for up to 1 year. You can of course change that to your liking, but that is what I wanted, so thats what I made it do.

Step 4. Change ALL Data Templates to use the new RRA. Go into the Data Templates page, start with the first one and work your way to the end. This is what takes the most time. I am sure there is an easier way to do this, but I don't know what it is.

Here is what you change in the Data Templates. Under Associated RRA's select all of them. Then under Step chage that from 300 to 60. Then down below under Data source Item -> Heartbeat change that from 600 to 120. Be careful, many templates have several differnt Data Source Items, you need to change the heartbeat for each one, which is annoying to say the least, but the upside is you only have to do this whole process once.

Step 5 start graphing. Now it works just like before, only you have 1 minute resolution, and a new view for your graphs too. You can choose to view any time range in Hourly (1 minute average) now which means that despite the fact that it may be a 5 minute average, if you zoom in it will go down the 1 minute level eventually.

Good luck, let me know if you have a problem following the directions, I did them mostly from memory, so some stuff may be slightly different in your environment.
eddievenus
Cacti User
Posts: 60
Joined: Mon Jul 18, 2005 7:01 pm

The Why.

Post by eddievenus »

Ok, now the why.

At no time is 1 5 minute snapshot EVER better than 5 1 minute snapshots. Remember that that is exactly what snmp polling does, it takes a snapshot of the traffic at that exact second. Sure you can say that statistically over 1 day that 1 polling every 5 minutes should show you roughly the same average information as 1 polling every 1 minute, but that is not how the real world is. Look at the disparity present between 5 minute and 30 minute, or between any 2 RRAs for that matter. The smaller the time span the RRA works with the more likely you are to see the highs and lows, which means a more accurate depiction of real traffic.

Network traffic is not much different that driving cross country. If you were on a bus heading cross country and you only opened your eyes for 1 second every 5 minutes you would miss a lot. Sure you would see the cities, and probably even some of the towns, and just sheer odds say that you will see a few random small things like a passing car or a stop sign, but by and large you will miss 99.6667 % of the things passing by and will see things out of context.

Now say that you opened your eyes for 1 second every minute. You would still miss 98.333% of the things passing by, but you would likely see a lot more of the towns you pass, as well as more of the stop signs you stop at, as well as more of the random things passing by for only a second. You would also have a better idea of how long it took you to pass that city. Before you may pass Beloit Wisconsin and happen to open your eyes once and catch it, and by the next time you open your eyes it is gone. Now that you are opening them more, you may open them and catch is 2 or 3 times before it is gone. You know that Beloit takes at least 2 or 3 minutes to drive past. You may pass Atlanta Georgia and it catch it 10 times while at 5 min intervals and assume that it takes 50 min to pass, but at 1 min intervals you see it 54 times, and know that it actually takes 54 minutes to pass.

Aside from all that, at 1 min intervals you are more likely to catch the Sears Tower go by while passing Chicago, and the Empire State Building while passing New York. Thus you are more likely to catch the big events, and know how long they last. And correct me if I am wrong, but that is the main reason why we are watching our bandwidth right? To see how much is being used and how often?

I know that this analogy has its limits, and I know that even at 1 minute intervals you will miss things. I also know that some people just want rough approximations and averages. But not me, and if I don't want it there are others that feel the same as I do. I am not an island of 1 minute intervals in a river of 5 min purists.

This debate could go on for a while, but the only reasons I see for not doing 1 min resolution is that if the polling time gets too close to the pollin cycle frequency it could cause bad data. If the poller takes longer than 60 sec to complete, it will back up on itself and cause spikes and hiccups in the data. This is a fact. With 5 min polling you have 300 sec to do all it has to do, which is plenty of time. However I tune all my installations to be fast, to use Cactid with multiple threads and if possible multiple concurrent processes so that each one can use multiple threads. I keep the number of interfaces down to reasonable figures and RRDs in the mid to low hundreds at most.

Dual cores or better and lots of RAM make room for more RRDs, but this is still too fast for any really large operation. The only fix is to throw more CPUs and RAM at it, which could mean clustering at some point too. However for some it is just not doable in 60 sec.

If size is of concern, then this is just not important enough. HD space is so cheap these days, even SCSI space can be had for almost nothing anymore. In fact I have noticed that I have yet to breach 6 gig, even with the syslog database going full tilt and hundreds of RRDs all at 1 min resolution.

Finally I would like to note that I do not use averages either, I use MAX values for all graphing. A 5 min average on a T1 line (1.544Mb/s) when a 12MB file goes accross shows up as ~ 310kb/s. Who cares about 310k on a T1? However for over 1 minute your whole pipe was maxxed out. You never would have seen that on a 5 min average. Even a 1 min average would be misleading most of the time on file sizes smaller that 10 meg.

The moral of the story is to use Max values and reduce the polling frequency. That same example under 1 min polling and MAX values would show 2 spikes up to the max bandwidth of 1.536Mb/s (a T1s actual usable bandwidth after overhead). This example can be reused under any line condition and will always have the same results, that 1 min polling and MAX values will show the most accurate data.

Thanks for reading my thoughts on this, I am sorry however if you did not find it interesting or useful since this is several minutes of your life you can never have back.

EddieVenus
eddievenus
Cacti User
Posts: 60
Joined: Mon Jul 18, 2005 7:01 pm

Post by eddievenus »

As a side note, I have been doing all my cactiEZ installs in VMware on windows xp desktops with 512 of RAM or better and pIII 733 or better CPUs.

I have noticed that VMware needs to have a crontab entry to do NTP every minute so as to avoid the problem with clock slip where the virtual machine gets way way ahead of the real PCs clock. But once that is done and they can stay in sync this is really a great solution and easy to boot.

Just make a VM of CactiEZ, make all the changes you want, add all the plugins you need, remove what you want to. Change the polling frequency, etc and when it is all done, just copy that VM. Now deploy whenever and wherever you want to. One note, it is slower than a stand alone PC would be, but in small installs this is not noticeable.

I only mention this here because it has a bearing on my previous comments about speeds and such as well as on my use of Cacti. Not everyone will use it as I do, and not every one needs 1 min polling, but for those who do the guidlines above should help you get there.

As another side note, I checked with 5 of my Cacti installs and see their poller run times are averaging 7 seconds and maxxing out under 20 seconds. So in theory I could boost the resolution up to 30 seconds, or 2 times a minute comfortably in some cases. This would of course again double the file size as well as the tax on the CPU every minute. It would also mean that tolerances have to be even tighter to avoid bad data. But it would double my resolution from 99.6667 at 5 min to 98.3333 at 1 minute to 96.6667 at 30 sec. That means that I would actually catch 3.333% of all traffic on the line, and spikes lasting less than 1 min will be much more likely to be seen. Also at this point average values and max values will be much closer to the same per poll. I have no plans to do so, but someone could if they saw fit to do so.

Enjoy.
jacauc
Posts: 34
Joined: Sun Sep 10, 2006 1:05 am

Post by jacauc »

Wow! Thanks Eddievenus for taking the time to document all of this.

I have followed your steps (running windows), and changed all the data templates to the 120/60 values as you specified. This is pretty much what I need. Thanks! (Goes much quicker if you edit the database directly :D)


I would like to have a kind of "realtime" monitor on our equipment, that is why I am doing the 1 minute intervals. We have Solarwinds generating graphs for our WAN link, and we want to replace this with something similar using Cacti.

When I look at the solarwinds graph (attachment) I can pretty much see what's happening on the link "NOW".
In cacti I see a VERY spiky graph....more or less the same trend, but much more ups and downs... this I'd like to slightly "average out"... is this possible?



Any ideas on how to go ahead?

Thanks again!
jacauc
Attachments
solarwinds.jpg
solarwinds.jpg (44.88 KiB) Viewed 36356 times
zeryl
Posts: 1
Joined: Wed Sep 13, 2006 2:57 am

Post by zeryl »

eddievenus wrote: Step 4. Then under Step chage that from 300 to 60. Then down below under Data source Item -> Heartbeat change that from 600 to 120. Be careful, many templates have several differnt Data Source Items, you need to change the heartbeat for each one, which is annoying to say the least, but the upside is you only have to do this whole process once.
Easy SQL query to use to change this is

Code: Select all

update data_template_data set rrd_step = 60 where rrd_step = 300;
update data_template_rrd set rrd_heartbeat = 120 where rrd_heartbeat = 600;
This assumes you left the Step at 300, and the heartbeat at 600.
This will change all of it in one go :)

You still have to select the new RRA manually, I haven't looked to see how to do it quickly.
Post Reply

Who is online

Users browsing this forum: No registered users and 2 guests