Graphs just stopped graphing

Post general support questions here that do not specifically fall into the Linux or Windows categories.

Moderators: Developers, Moderators

Post Reply
sedoshi
Posts: 28
Joined: Tue Sep 06, 2005 12:37 pm
Location: Cincinnati, Ohio
Contact:

Graphs just stopped graphing

Post by sedoshi »

Hello I have a question.. I have been running my instance of cacti for months and upgrade to the lastest version of cacti about a month back. Everything has been working just fine until yesterday at noon when all my graphs went to NAN. Anyone have any reason why this would happen?

Thanks,
Scott
"Keep away from people who try to belittle your ambitions. Small people always do that, but the really great make you feel that you, too, can
become great."
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Post by gandalf »

Disk full, crond stopped and lots more. Find some debugging help at the documentation site (see tab at the top of this page or second link of my signature)
Reinhard
sedoshi
Posts: 28
Joined: Tue Sep 06, 2005 12:37 pm
Location: Cincinnati, Ohio
Contact:

RE

Post by sedoshi »

Well I have checked all the obvious options as well tried to find out how to debug the NAN's from your information prior to writing.
Disk is not full,
cron is still running,
command line running of the script shows data being gathered with no errors, in the logs the rrdtool update is working correctly.

This is more of things just stopped for no good reason. Nothing changed nothing failed or stopped other than the counters. The server had not been touched for over a week prior to this happening. I have gone through the entire page for debugging NAN's and everything works while following the instructions. Does not make much sense to me.

Scott
"Keep away from people who try to belittle your ambitions. Small people always do that, but the really great make you feel that you, too, can
become great."
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: RE

Post by gandalf »

sedoshi wrote:... in the logs the rrdtool update is working correctly.
...
There will never be a failure of rrdtool update in the logs. You will have to check this manually. Table poller_output is not filled up? php memory checked? No changes of mysql, php, rrdtool via yum/apt? Running the poller manually as cactiuser yields valid results or NaN?
Reinhard
sedoshi
Posts: 28
Joined: Tue Sep 06, 2005 12:37 pm
Location: Cincinnati, Ohio
Contact:

RE

Post by sedoshi »

No it is a BSD machine and I am the only user who has rights on the server so there were no upgrades or port changes by anyone. I looked into memory issues as well with php and that should be good as well.

Running the poller I get some zero's returned on some interfaces.. But I have about 380+ devices. This is a diverse network so I have Extreme, Cisco, Tasman, and a few others being tracked. Most show counters gathering data, but still show NAN's on the graphs. That is what I think is the confusing thing. Nothing in the logs show any type of items out of whack. I have the log running in Debug and have for a couple of days. I wrote a little script to zip the log every 15 minutes and move it to another dir so I don't fill that one up. So I can keep an eye on the logs and have some sort of back reference. I am not seeing any things that have changed from prior logs either.

I had not check the Poller table, but when I looked at it it does not appear to be full in any way. That I can tell.

Thanks for the replies. I am not real sure why this would just stop.. Although I do have a lot of larger devices in the system, but the poller times are still down in the 115 Sec range.
"Keep away from people who try to belittle your ambitions. Small people always do that, but the really great make you feel that you, too, can
become great."
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Post by gandalf »

There is one weird idea: if there had been an update "in future", e.g. with some higher timestamp, rrdtool will refuse to accept updates at "current" timestamps. You will have to chech the rrd files last update timestamp and calculate a "human readable" format from this
Reinhard
User avatar
TheWitness
Developer
Posts: 17059
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

Yes, playing with your clock by setting it forward and then back again, after polling has taken place can be disasterous, especially if it's more than an hour.

In RRDtool, if your time zone uses Daylight Savings Time. I don't care which date it is, you always loose an hour of data twice a year, but if you set your clock forward a year or two, well, you might as well pull your your backup tapes.

Reinhard, this would be an excellent script you know. A "I messed my RRDfiles by playing with the system clock utility". What do you think?

Larry
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Post by gandalf »

TheWitness wrote:...
Reinhard, this would be an excellent script you know. A "I messed my RRDfiles by playing with the system clock utility". What do you think?

Larry
This was already discussed "some very few times" at rrdtool-users mailing list. Answer is always: Dump the rrd file, edit the "wrong" data" and reload again. There seems to be no contrib up to now (except for the rrd file editor published somewhere. But this is a GUI tool only, from description. Not suitable for mass changes).
The drawback is, in my opinion, that you may not know the timespan of the "wrong" data only from analyzing the rrd file. Best approach I'm currently aware of would be: Take highest value, scroll backwards until you find a gap and scratch this timespan. Comments are welcome.
I'm not happy with asking users dumb questions, but I suppose it would be better to implement a prompt before deleting stuff (of course, I would create a copy of the rrd file).
Another bad item: Depending on the time spent between discovering this error, live data will be lost (as rrdtool update throws it away). I do not know about any approach to get rid of this.
Please comment.
Reinhard
sedoshi
Posts: 28
Joined: Tue Sep 06, 2005 12:37 pm
Location: Cincinnati, Ohio
Contact:

RE: Working again

Post by sedoshi »

Ok so I posted a message about this Monday but it looks as tho it did not post. So Friday afternoon around 2pm the graphs just started working again? As far as the DST and time goes there was a time change but that would have happened on Sat night Sun morning. And since this was BSD 6.2 server I did not have to mess with the time zone files on this server. I am not sure what all happened here. The one thing I did do was rebuilt the poller cache. I am not sure if that had anything to do with it. But that is pretty much all I had done. Everything seems to be working perfect again. I am still looking at things trying to find what cause the hicup. In any case I appreciate the help and I am sorry that I have not posted again until today. But it is working again and I have no idea what caused the original issue.

Scott
"Keep away from people who try to belittle your ambitions. Small people always do that, but the really great make you feel that you, too, can
become great."
pheezy
Cacti User
Posts: 61
Joined: Thu Oct 26, 2006 5:30 pm

Post by pheezy »

I'm having the same issue, though only about half of my graphs stopped graphing. The rest are fine. They all stopped graphing at the same time though. I ran through the NaN debugging but didn't see anything strange and I haven't updated anything recently. I did run an ntpdate about an hour before, so I'm thinking the time may have been changed, corrupting some of the rrds. But I would think this would affect all graphs and not just those in another data center. This is very strange. :(
TheMaskedMan
Posts: 6
Joined: Tue Aug 01, 2006 4:37 pm

Post by TheMaskedMan »

I am having a similar problem where only Bits per second with total bandwidth graphs are getting nan's but packets per second and everything else is going through. I have another installation of cacti set up and don't have the same problem. I'm using the same version of rrdtool and net-snmp. I have followed the nan guide and can't see any reason for this.
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Post by gandalf »

pheezy wrote:I'm having the same issue, though only about half of my graphs stopped graphing. The rest are fine. They all stopped graphing at the same time though. I ran through the NaN debugging but didn't see anything strange and I haven't updated anything recently. I did run an ntpdate about an hour before, so I'm thinking the time may have been changed, corrupting some of the rrds. But I would think this would affect all graphs and not just those in another data center. This is very strange. :(
Not necessarily. This depends on the time offset you've introduced and on the number of rrd's that were already updated at this point in time.
But surely, there may be some other reason. Please open an new thread with more details
Reinhard
Post Reply

Who is online

Users browsing this forum: No registered users and 0 guests