how to clear out a LARGE spike that screws up the graph
Moderators: Developers, Moderators
- TheWitness
- Developer
- Posts: 17047
- Joined: Tue May 14, 2002 5:08 pm
- Location: MI, USA
- Contact:
I guess I should add an a new "utility" to schedule remove spikes. Just don't have th cycles right now.
TheWitness
TheWitness
True understanding begins only when we realize how little we truly understand...
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
I tool would be handy, since the old removespikes.pl script does not work with rrdtool 1.2 (just tried it last night). I had to manually export my rrd to an xml file, remove the spiked entries, and then import it back to the rrd. Since then I also updated my max DS number so this hopefully won't happen again when the router is restarted.
| Scripts: Monitor processes | RFC1213 MIB | DOCSIS Stats | Dell PowerEdge | Speedfan | APC UPS | DOCSIS CMTS | 3ware | Motorola Canopy |
| Guides: Windows Install | [HOWTO] Debug Windows NTFS permission problems |
| Tools: Windows All-in-one Installer |
If you're using rrdtool 1.2 rrd files, then the only way (that I know of) to fix spikes is dump the database to an xml file, edit out the spikes, then reimport it to rrd file. I heard one of the devs was possibly making a new script that did this....
| Scripts: Monitor processes | RFC1213 MIB | DOCSIS Stats | Dell PowerEdge | Speedfan | APC UPS | DOCSIS CMTS | 3ware | Motorola Canopy |
| Guides: Windows Install | [HOWTO] Debug Windows NTFS permission problems |
| Tools: Windows All-in-one Installer |
As I mentioned on upstairs, if the device or service need reboot or restart once or twice a day, I can't do this "remove spike" process manually everyday, it need to be a automatic process or need to be solved by rrdtools.BSOD2600 wrote:If you're using rrdtool 1.2 rrd files, then the only way (that I know of) to fix spikes is dump the database to an xml file, edit out the spikes, then reimport it to rrd file. I heard one of the devs was possibly making a new script that did this....
Min/Max thresholds in data sources
During our testing phase, we encountered the same problems mentioned in this thread. We also didn't have much luck using the suggested fixes like removespikes.pl and rrdtool tune. We're now moving toward production with a clean install and I would like to ensure our datasources are adjusted before we start polling interface statistics.I'd like to recomment omitting the spikes. Define a MAX value at the Data Source level in the Data Template. Readings will be set to NaN during rrdtool update. To do this for existing rrd's, you'll have to use rrdtool tune <file> --maximum <ds>:<value>
Reinhard
Software:
Debian, kernel 2.6.8
Sources: Testing (Etch)
Apache 2.0.55
PHP 4.4.2
MySQL 4.1.15
RRD 1.2.11
Cacti 0.8.6h
Hardware:
HP 2000r
Intel 10/100 NIC
1. We will be tracking In/Out Bits w/ Total Bandwidth on:
Intel 10/100 NICs connected to a 10/100 (store and forward) switch.
During our testing, we had the following stats:
Inbound: Average 21.97k Maximum 1.12M
OUtbound: Average 61.85k Maximum 8.92M
Our spikes are caused by nightly backups. We expect this to be the highest throughput we'll ever see in production as well.
If that turns out to be the case, would setting the data source maximums to 5000000 (5 million) lop off the spike and pull our other data in the graphs up to a point that it's useful? Lower? Higher?
I am not a network guru, just a sys admin so I would like some confirmation based on the above info. This look good, bad ugly?
If Ive neglected to included any pertinent information, please let me know.
Thanks for any help and for a great tool!
Chris
-
- Cacti User
- Posts: 60
- Joined: Mon Jul 18, 2005 7:01 pm
http://forums.cacti.net/viewtopic.php?t=13070
Any news on this topic?
I am still looking for the remove spikes feature in Cacti really. I know that there are other projects, and then after those there are projects that everyone wants to get to, and then after those are more projects that someone else asked you to get to and then finally after all of that is this project. I understand, and I do not envy the work load this (meaning being a cacti developer) must put on your desk. I do however envy how cool it is to actually be able to write even some of the code required to make this tool work.
However, this is a very, very serious problem, at least for me. I have been working with Cacti now for a while, almost a year now. Before that was a slew of lesser applications, such as MRTG based apps, commercial apps (i.e. Fluke Optieview Console, PRTG, OpenView, CiscoWorks, etc)
They all fail to exhibit the potential that Cacti has. Even now I prefer Cacti to any of those apps. I prefer it for many reasons, but there is one thing that keeps me from being able to replace any of them for good.
Those damn spikes. They show up in my graphs, despite tuning them (even taking into account bit to byte conversions), despite using MAX values, etc. This new RRD Editor( http://forums.cacti.net/viewtopic.php?t=13070) is cool, but slow and only does 1 file at a time. It is a GUI for the the old removespikes script in essance, or at least it is the way I use it.
I have no complaints about the performance of the Cacti application, in fact I praise it to every one I talk to about this (and this is a fair number of people, since I am a sr network consultant). But I have to say that without a built in way to deal with spikes in the graphs, this is can never go prime time.
That is just my $.02.
EV
I am still looking for the remove spikes feature in Cacti really. I know that there are other projects, and then after those there are projects that everyone wants to get to, and then after those are more projects that someone else asked you to get to and then finally after all of that is this project. I understand, and I do not envy the work load this (meaning being a cacti developer) must put on your desk. I do however envy how cool it is to actually be able to write even some of the code required to make this tool work.
However, this is a very, very serious problem, at least for me. I have been working with Cacti now for a while, almost a year now. Before that was a slew of lesser applications, such as MRTG based apps, commercial apps (i.e. Fluke Optieview Console, PRTG, OpenView, CiscoWorks, etc)
They all fail to exhibit the potential that Cacti has. Even now I prefer Cacti to any of those apps. I prefer it for many reasons, but there is one thing that keeps me from being able to replace any of them for good.
Those damn spikes. They show up in my graphs, despite tuning them (even taking into account bit to byte conversions), despite using MAX values, etc. This new RRD Editor( http://forums.cacti.net/viewtopic.php?t=13070) is cool, but slow and only does 1 file at a time. It is a GUI for the the old removespikes script in essance, or at least it is the way I use it.
I have no complaints about the performance of the Cacti application, in fact I praise it to every one I talk to about this (and this is a fair number of people, since I am a sr network consultant). But I have to say that without a built in way to deal with spikes in the graphs, this is can never go prime time.
That is just my $.02.
EV
Spikes - maybe another possible solution
Under the many options of rrdtool (rrdgraph) there is an option for upper and lower limits used for graphing:
Limits
[-u|--upper-limit value] [-l|--lower-limit value] [-r|--rigid]
By default the graph will be autoscaling so that it will adjust the y-axis to the range of the data. You can change this behaviour by explicitly setting the limits. The displayed y-axis will then range at least from lower-limit to upper-limit. Autoscaling will still permit those boundaries to be stretched unless the rigid option is set.
Why don't we use that with the max value, that is used for storing data in the rrd's?
Limits
[-u|--upper-limit value] [-l|--lower-limit value] [-r|--rigid]
By default the graph will be autoscaling so that it will adjust the y-axis to the range of the data. You can change this behaviour by explicitly setting the limits. The displayed y-axis will then range at least from lower-limit to upper-limit. Autoscaling will still permit those boundaries to be stretched unless the rigid option is set.
Why don't we use that with the max value, that is used for storing data in the rrd's?
-
- Cacti User
- Posts: 60
- Joined: Mon Jul 18, 2005 7:01 pm
That is a good idea, I will actually try to see if that takes some of the spikes out of the graphs, at least visually. It would be nice if we could change this per graph, or better yet on a selection of graphs at once from within Cacti.
It would not eliminate the actual spikes from the data, but it would stop them from displaying.
I have tried messing around with the Graph Templates -> Graph Template ->Rigid Boundries and the Upper Limit sections and they do NOTHING after an RRD is created as far as I can tell. They do not remove spikes, they not don't stop them from showing up down the road, nothing.
Changing it per RRD file by command line seems to be the only thing that works.
That needs to be fixed.
It would not eliminate the actual spikes from the data, but it would stop them from displaying.
I have tried messing around with the Graph Templates -> Graph Template ->Rigid Boundries and the Upper Limit sections and they do NOTHING after an RRD is created as far as I can tell. They do not remove spikes, they not don't stop them from showing up down the road, nothing.
Changing it per RRD file by command line seems to be the only thing that works.
That needs to be fixed.
-
- Cacti User
- Posts: 60
- Joined: Mon Jul 18, 2005 7:01 pm
Any changes, adds, deletions on this topic?
I see mention of remove spike problems in threads all the time. This tells me that this is a very real and a very prevelant problem, and we are not the only ones talking about it. Plus if the complainers are just the tip of the iceburg, this is a bigger problem then it seems.
I have graphs that had not taken a spike for 6 months, then one day take a spike of 100 times the normal values. Now when you look at those graphs you see nothing but a single spike.
I have others that have so many spikes in them, all days apart that it almost looks normal. I have tried the removespikes.pl but while it used to work somewhat, now it breaks things and the graphs do not work after I use that. Plus I do 1 minute polling so it takes FOREVER for it to finish 1 RRD. Usually several pollng cycles (sometimes up to 20) are missed. I now have spikes and giant divits in my graphs.
I rrdtune the bad ones when I find them, but I cannot do this each and every time I make a new interface. 1 switch is 24-388 interfaces. The time it would take to rrdtune each one individually to its desired size is stagering. If there were some place in the GUI to add this value (and there is, sort of, already, in the Graph Template), or better yet if there were a way to set this individually for each data source (preferably in a way which could be batched) that would be amazingly usefull.
If there were a way I could keep the 3 months to 1 year of data I already have and still remove the spikes from it, this would be the single greatest upgrade achievment put forth so far.
Please please please consider this for the next release.
thanks,
EddieVenus
I see mention of remove spike problems in threads all the time. This tells me that this is a very real and a very prevelant problem, and we are not the only ones talking about it. Plus if the complainers are just the tip of the iceburg, this is a bigger problem then it seems.
I have graphs that had not taken a spike for 6 months, then one day take a spike of 100 times the normal values. Now when you look at those graphs you see nothing but a single spike.
I have others that have so many spikes in them, all days apart that it almost looks normal. I have tried the removespikes.pl but while it used to work somewhat, now it breaks things and the graphs do not work after I use that. Plus I do 1 minute polling so it takes FOREVER for it to finish 1 RRD. Usually several pollng cycles (sometimes up to 20) are missed. I now have spikes and giant divits in my graphs.
I rrdtune the bad ones when I find them, but I cannot do this each and every time I make a new interface. 1 switch is 24-388 interfaces. The time it would take to rrdtune each one individually to its desired size is stagering. If there were some place in the GUI to add this value (and there is, sort of, already, in the Graph Template), or better yet if there were a way to set this individually for each data source (preferably in a way which could be batched) that would be amazingly usefull.
If there were a way I could keep the 3 months to 1 year of data I already have and still remove the spikes from it, this would be the single greatest upgrade achievment put forth so far.
Please please please consider this for the next release.
thanks,
EddieVenus
You guys whose graphs stop updating after you run removespikes.pl (you get a whole bunch of NaN entries), you need to change the permissions on your RRD file back to the way they were. Running removespikes.pl as root recreates the RRD with owner and group of root and probably the wrong chmod values!
chown nagios:nagios bla.rrd
chmod 755 bla.rrd
To those of you who don't want to use the script, or it's chopping off data where you don't want it to, you can use rrdtool to dump your rrd's to XML then edit them manually. do this:
Dump the rrd to xml:
#> rrdtool dump bla.rrd > bla.xml
edit the values you want:
#> vi bla.xml
Recreate the rrd file:
#> rrdtool restore bla.xml bla-new.rrd
Good luck!
chown nagios:nagios bla.rrd
chmod 755 bla.rrd
To those of you who don't want to use the script, or it's chopping off data where you don't want it to, you can use rrdtool to dump your rrd's to XML then edit them manually. do this:
Dump the rrd to xml:
#> rrdtool dump bla.rrd > bla.xml
edit the values you want:
#> vi bla.xml
Recreate the rrd file:
#> rrdtool restore bla.xml bla-new.rrd
Good luck!
-
- Posts: 17
- Joined: Mon Jul 16, 2007 10:11 am
Hello, I have been using removespikes.pl to clear some spikes from my graphs. I added this bit at the end (in bold) to chown the new rrd file:
if ($cont == 0) { print "No peaks found.!\n"; }
else {
rename($ARGV[0],"$ARGV[0].old");
$lino="rrdtool restore $tempfile.xml $ARGV[0]";
system($lino);
die "$0: Unable to execute the rrdtool restore on $ARGV[0] - $! - $@\n" if $!;
$lino="chown www-data:www-data $ARGV[0]";
system($lino);
die "$0: Unable to change ownership on $ARGV[0] - $! - $@\n" if $!;
}
So while it's got rid of most of the spikes, there are still some really HUGE ones it didn't get rid of and I am struggling to understand the program's logic.
I don't understand the percentage variation, surely I want to eliminate spikes with a large variation, but the way the comments read in the code it suggests it will remove spikes UNDER a certain variation.
Basically I don't know what to set the limit to (-l parameter).
I enabled debugging and got this output:
Can anyone tell me what this output means and what I should set the limit to?
Cheers,
Paul
if ($cont == 0) { print "No peaks found.!\n"; }
else {
rename($ARGV[0],"$ARGV[0].old");
$lino="rrdtool restore $tempfile.xml $ARGV[0]";
system($lino);
die "$0: Unable to execute the rrdtool restore on $ARGV[0] - $! - $@\n" if $!;
$lino="chown www-data:www-data $ARGV[0]";
system($lino);
die "$0: Unable to change ownership on $ARGV[0] - $! - $@\n" if $!;
}
So while it's got rid of most of the spikes, there are still some really HUGE ones it didn't get rid of and I am struggling to understand the program's logic.
I don't understand the percentage variation, surely I want to eliminate spikes with a large variation, but the way the comments read in the code it suggests it will remove spikes UNDER a certain variation.
Basically I don't know what to set the limit to (-l parameter).
I enabled debugging and got this output:
Code: Select all
--percentages--
00:00--310/4528 = 6.84628975265018%
00:01--1678/4528 = 37.0583038869258%
00:02--1605/4528 = 35.4461130742049%
00:03--807/4528 = 17.8224381625442%
00:05--128/4528 = 2.82685512367491%
No peaks found.!
Cheers,
Paul
-
- Posts: 17
- Joined: Mon Jul 16, 2007 10:11 am
Ok, so I took the plunge and set it to 3 to try and capture the "2.82" ones - it deleted a load of stuff but seems to have worked and now I get this:
So I guess it's taken out the ones that were 2.82%. Still don't really understand it. 2.82% of what?
Code: Select all
--percentages--
00:00--310/4528 = 6.84628975265018%
00:01--1702/4528 = 37.5883392226148%
00:02--1607/4528 = 35.4902826855124%
00:03--909/4528 = 20.0750883392226%
-
- Posts: 17
- Joined: Mon Jul 16, 2007 10:11 am
Ok, so finally I have cleared out all my spikes and now I have used "rrdtool tune" to set the maximum value in my existing rrd files. I used a little script to cycle through each one and set it for the relevant data source name:
I'll shut up now!
Code: Select all
#!/usr/bin/perl
# Set max value in DHCP rrd files
use strict;
use warnings;
my @files;
my $file;
my $exec;
@files=glob("/usr/share/cacti/site/rra/*dhcp_acks*.rrd");
foreach $file(@files)
{
print "Updating $file\n";
$exec = `rrdtool tune $file -a dhcp_Acks:1000`;
}
yeah I got caught by the chown, woke up today after I cleared spikes this morning to discover the graphs had stopped updating, after about 20 minutes of head scratching I decided to check if the ownership of files had changed to root and indeed they did so have chowned them back to cacti and hope that fixes the problem.
Who is online
Users browsing this forum: No registered users and 1 guest