Graphs just stopped graphing
Moderators: Developers, Moderators
Graphs just stopped graphing
Hello I have a question.. I have been running my instance of cacti for months and upgrade to the lastest version of cacti about a month back. Everything has been working just fine until yesterday at noon when all my graphs went to NAN. Anyone have any reason why this would happen?
Thanks,
Scott
Thanks,
Scott
"Keep away from people who try to belittle your ambitions. Small people always do that, but the really great make you feel that you, too, can
become great."
become great."
RE
Well I have checked all the obvious options as well tried to find out how to debug the NAN's from your information prior to writing.
Disk is not full,
cron is still running,
command line running of the script shows data being gathered with no errors, in the logs the rrdtool update is working correctly.
This is more of things just stopped for no good reason. Nothing changed nothing failed or stopped other than the counters. The server had not been touched for over a week prior to this happening. I have gone through the entire page for debugging NAN's and everything works while following the instructions. Does not make much sense to me.
Scott
Disk is not full,
cron is still running,
command line running of the script shows data being gathered with no errors, in the logs the rrdtool update is working correctly.
This is more of things just stopped for no good reason. Nothing changed nothing failed or stopped other than the counters. The server had not been touched for over a week prior to this happening. I have gone through the entire page for debugging NAN's and everything works while following the instructions. Does not make much sense to me.
Scott
"Keep away from people who try to belittle your ambitions. Small people always do that, but the really great make you feel that you, too, can
become great."
become great."
- gandalf
- Developer
- Posts: 22383
- Joined: Thu Dec 02, 2004 2:46 am
- Location: Muenster, Germany
- Contact:
Re: RE
There will never be a failure of rrdtool update in the logs. You will have to check this manually. Table poller_output is not filled up? php memory checked? No changes of mysql, php, rrdtool via yum/apt? Running the poller manually as cactiuser yields valid results or NaN?sedoshi wrote:... in the logs the rrdtool update is working correctly.
...
Reinhard
RE
No it is a BSD machine and I am the only user who has rights on the server so there were no upgrades or port changes by anyone. I looked into memory issues as well with php and that should be good as well.
Running the poller I get some zero's returned on some interfaces.. But I have about 380+ devices. This is a diverse network so I have Extreme, Cisco, Tasman, and a few others being tracked. Most show counters gathering data, but still show NAN's on the graphs. That is what I think is the confusing thing. Nothing in the logs show any type of items out of whack. I have the log running in Debug and have for a couple of days. I wrote a little script to zip the log every 15 minutes and move it to another dir so I don't fill that one up. So I can keep an eye on the logs and have some sort of back reference. I am not seeing any things that have changed from prior logs either.
I had not check the Poller table, but when I looked at it it does not appear to be full in any way. That I can tell.
Thanks for the replies. I am not real sure why this would just stop.. Although I do have a lot of larger devices in the system, but the poller times are still down in the 115 Sec range.
Running the poller I get some zero's returned on some interfaces.. But I have about 380+ devices. This is a diverse network so I have Extreme, Cisco, Tasman, and a few others being tracked. Most show counters gathering data, but still show NAN's on the graphs. That is what I think is the confusing thing. Nothing in the logs show any type of items out of whack. I have the log running in Debug and have for a couple of days. I wrote a little script to zip the log every 15 minutes and move it to another dir so I don't fill that one up. So I can keep an eye on the logs and have some sort of back reference. I am not seeing any things that have changed from prior logs either.
I had not check the Poller table, but when I looked at it it does not appear to be full in any way. That I can tell.
Thanks for the replies. I am not real sure why this would just stop.. Although I do have a lot of larger devices in the system, but the poller times are still down in the 115 Sec range.
"Keep away from people who try to belittle your ambitions. Small people always do that, but the really great make you feel that you, too, can
become great."
become great."
- TheWitness
- Developer
- Posts: 17059
- Joined: Tue May 14, 2002 5:08 pm
- Location: MI, USA
- Contact:
Yes, playing with your clock by setting it forward and then back again, after polling has taken place can be disasterous, especially if it's more than an hour.
In RRDtool, if your time zone uses Daylight Savings Time. I don't care which date it is, you always loose an hour of data twice a year, but if you set your clock forward a year or two, well, you might as well pull your your backup tapes.
Reinhard, this would be an excellent script you know. A "I messed my RRDfiles by playing with the system clock utility". What do you think?
Larry
In RRDtool, if your time zone uses Daylight Savings Time. I don't care which date it is, you always loose an hour of data twice a year, but if you set your clock forward a year or two, well, you might as well pull your your backup tapes.
Reinhard, this would be an excellent script you know. A "I messed my RRDfiles by playing with the system clock utility". What do you think?
Larry
True understanding begins only when we realize how little we truly understand...
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
- gandalf
- Developer
- Posts: 22383
- Joined: Thu Dec 02, 2004 2:46 am
- Location: Muenster, Germany
- Contact:
This was already discussed "some very few times" at rrdtool-users mailing list. Answer is always: Dump the rrd file, edit the "wrong" data" and reload again. There seems to be no contrib up to now (except for the rrd file editor published somewhere. But this is a GUI tool only, from description. Not suitable for mass changes).TheWitness wrote:...
Reinhard, this would be an excellent script you know. A "I messed my RRDfiles by playing with the system clock utility". What do you think?
Larry
The drawback is, in my opinion, that you may not know the timespan of the "wrong" data only from analyzing the rrd file. Best approach I'm currently aware of would be: Take highest value, scroll backwards until you find a gap and scratch this timespan. Comments are welcome.
I'm not happy with asking users dumb questions, but I suppose it would be better to implement a prompt before deleting stuff (of course, I would create a copy of the rrd file).
Another bad item: Depending on the time spent between discovering this error, live data will be lost (as rrdtool update throws it away). I do not know about any approach to get rid of this.
Please comment.
Reinhard
RE: Working again
Ok so I posted a message about this Monday but it looks as tho it did not post. So Friday afternoon around 2pm the graphs just started working again? As far as the DST and time goes there was a time change but that would have happened on Sat night Sun morning. And since this was BSD 6.2 server I did not have to mess with the time zone files on this server. I am not sure what all happened here. The one thing I did do was rebuilt the poller cache. I am not sure if that had anything to do with it. But that is pretty much all I had done. Everything seems to be working perfect again. I am still looking at things trying to find what cause the hicup. In any case I appreciate the help and I am sorry that I have not posted again until today. But it is working again and I have no idea what caused the original issue.
Scott
Scott
"Keep away from people who try to belittle your ambitions. Small people always do that, but the really great make you feel that you, too, can
become great."
become great."
I'm having the same issue, though only about half of my graphs stopped graphing. The rest are fine. They all stopped graphing at the same time though. I ran through the NaN debugging but didn't see anything strange and I haven't updated anything recently. I did run an ntpdate about an hour before, so I'm thinking the time may have been changed, corrupting some of the rrds. But I would think this would affect all graphs and not just those in another data center. This is very strange.
-
- Posts: 6
- Joined: Tue Aug 01, 2006 4:37 pm
I am having a similar problem where only Bits per second with total bandwidth graphs are getting nan's but packets per second and everything else is going through. I have another installation of cacti set up and don't have the same problem. I'm using the same version of rrdtool and net-snmp. I have followed the nan guide and can't see any reason for this.
- gandalf
- Developer
- Posts: 22383
- Joined: Thu Dec 02, 2004 2:46 am
- Location: Muenster, Germany
- Contact:
Not necessarily. This depends on the time offset you've introduced and on the number of rrd's that were already updated at this point in time.pheezy wrote:I'm having the same issue, though only about half of my graphs stopped graphing. The rest are fine. They all stopped graphing at the same time though. I ran through the NaN debugging but didn't see anything strange and I haven't updated anything recently. I did run an ntpdate about an hour before, so I'm thinking the time may have been changed, corrupting some of the rrds. But I would think this would affect all graphs and not just those in another data center. This is very strange.
But surely, there may be some other reason. Please open an new thread with more details
Reinhard
Who is online
Users browsing this forum: No registered users and 1 guest