poller not running, rrd's filling with nan

Post support questions that directly relate to Linux/Unix operating systems.

Moderators: Developers, Moderators

Post Reply
snm77
Posts: 6
Joined: Wed Dec 09, 2009 4:45 pm

poller not running, rrd's filling with nan

Post by snm77 »

I've spent about 10 hours today fighting my cacti 0.8.7b install, reading this forum, and basically making a hash of my cacti instance (my own fault). Running on centos5 loaded on an ESX VM.

I started the day with several devices that I defined years ago working, their graphs recording, but everything new I added simply stayed in the status "Unknown".
I attempted to follow the advice here:
http://docs.cacti.net/manual:087:4_help ... #debugging
I did not read far enough down - I ran /usr/bin/spine --verbosity=5 from the command line 3 times within 10 minutes - making all my working graphs stop working. Rather than attempt to fix all the rrd's i'd just broken, I decided to start over. When working properly, it only takes me a few hours to put in the devices I need, and creating graphs from there is pretty simple :)

At this point, I have removed all the deivces and attempted to re-create them. They stay "unknown" unless I run spine manually. I have verifiied that /etc/cron.d/cacti is set to call poller.php every 5 minutes just as in the example given in the link above. If I run spine as root, the devices all show up as Up, cacti creates a few graphs, but all stay empty. I get errors if I run spine as cacti.

To summarize:
Check Cacti Log File
Setting the log level to debug, I never see the poller run unless I run it. I see no snmp timeouts.

Check Basic Data Gathering
I can do SNMP get's to any device I try.
I have to confess I don't know what .pl to run to check perl processing, I need more guidance here.

Check Cacti's Poller
I have not tried cmd.php, because we've been using spine for quite some time.
Setting to debug, I never see spine process, and I see no errors as to why in the cacti log.
If I run spine manually (as root), it appears to process fine, my devices chang eto Up, but the RRD's fill with nan.
The cacti user (cacti) has the correct crontab settings.
Running spine as the cati user form the command prompt results in errors about not running as root.

Check MySQL Update
No idea how to do this - if it helps, I see no SQL errors in the cacti log.

Check RRD File Update
Files are updating every time I run spine. Updating with nan's, but hey are updating.

Check RRD File Ownership
All the rrd's are owned by cacti.

Check RRD File Numbers
when I ran this, there are no ds[loss]. entries - there are other .min and .max entires, and they are correct now. I think running spine 3 times in 10 minutes caused the problem described in this section of the troubleshooting instructions, but at this point I've deleted and re-created all these files, all appear fine now. Other than filling with nan's :)

Check RRDTool Graph Statement
Graph Statements are fine.

I checked for multiple crontab entries, the only crontab that had the poller.php in it was the cacti crontab.


Oh, I deleted all the old rrd's as well, not just the devices and graphs, there were some abandoned RRD's from several years ago that really needed to go anyway.

I have not modified the poller.php in any way that I am aware of - and the fact that before I started messing with things my old graphs were working indicates that at least SOMETHING was happening.

Other things to note - I shut off three core switches about three weeks ago, and cacti said they were Up before I deleted their objects ths morning - indicating that spine has not been running for some time.

We just forklifted our entire LAN, I thought perhaps I was having snmp issues, but in fact when I run a verbose quesry of any device, I get full detail.

Any help would be appreciated - Cisco sold us Ciscoworks along with our forklift upgrade, assuring us that it would do what Cacti does and more. It does not. I need to get things going in here soon.
snm77
Posts: 6
Joined: Wed Dec 09, 2009 4:45 pm

Re: poller not running, rrd's filling with nan

Post by snm77 »

Ok, an update. I switched over to cmd.php, changed the crontab to:
*/5 * * * * cacti /usr/bin/php -q /var/www/cacti/poller.php > /dev/null 2>&1

and I'm getting some graphs. On the graphs that are not populating, the rrd's are updating with someting other than nan's. I may just need to give it enough time to gather enough data to populate the graph, we'll see. I still do not know why spine does not seem to work.

That said, I added a new entry, and it's been through three polling cycles and the system has not yet created the rrd's. Checking the polling cache, I see that the graphs I created have a data source entry with an rrd name, the rrd simply does not exist yet. This was my original problem - it appears that I cannot add anythign new to cacti and have the graphs populate. The new device is in status "Unknown" and won't come out, the rrd's are not getting created, and even at debug logging level in the cacti log, I'm not seeing anything that looks like an error.

I hope this helps someone more familiar with cacti figure out where my problem is. I'm still looking for answers in the forums, but so far, no luck.
snm77
Posts: 6
Joined: Wed Dec 09, 2009 4:45 pm

Re: poller not running, rrd's filling with nan

Post by snm77 »

More info. Here is the log entry from a graph that is working, followed by rrdtool tail of the file containing it's data, followed by the log entry of a graph that is NOT working, along with it's rrdtool tailed results.

WORKING:
02/25/2011 03:00:00 PM - WEBLOG: Poller[0] CACTI2RRD: /usr/bin/rrdtool graph - --imgformat=PNG --start=1298577598 --end=1298663998 --title="il21921e - Traffic - Gi0/0" --rigid --base=1000 --height=120 --width=500 --alt-autoscale-max --lower-limit=0 COMMENT:"From 2011/02/24 14\:59\:58 To 2011/02/25 14\:59\:58\c" COMMENT:" \n" --vertical-label="bits per second" --slope-mode --font TITLE:12: --font AXIS:8: --font LEGEND:10: --font UNIT:8: DEF:a="/var/www/cacti/rra/il21921e_traffic_in_3230.rrd":traffic_in:MAX DEF:b="/var/www/cacti/rra/il21921e_traffic_in_3230.rrd":traffic_out:MAX CDEF:cdefa=a,8,* CDEF:cdefe=b,8,* CDEF:cdefi=a,UN,INF,UNKN,IF AREA:cdefa#00CF00FF:"Inbound" GPRINT:cdefa:LAST:" Current\:%8.2lf %s" GPRINT:cdefa:AVERAGE:"Average\:%8.2lf %s" GPRINT:cdefa:MAX:"Maximum\:%8.2lf %s\n" LINE1:cdefe#002A97FF:"Outbound" GPRINT:cdefe:LAST:"Current\:%8.2lf %s" GPRINT:cdefe:AVERAGE:"Average\:%8.2lf %s" GPRINT:cdefe:MAX:"Maximum\:%8.2lf %s" AREA:cdefi#8F9286FF:""

rrdtool fetch il21921e_traffic_in_3230.rrd AVERAGE
...
1298660400: 1.7424373068e+01 5.3655554525e+01
1298660700: 1.5376251093e+01 5.5694811918e+01
1298661000: 1.7348575839e+01 5.3718166890e+01
1298661300: 1.5064133333e+01 5.5149933333e+01
1298661600: 1.6405000000e+01 5.3706733333e+01
1298661900: 1.5258303630e+01 5.5623632343e+01
1298662200: 1.7133191319e+01 5.3537140047e+01
1298662500: 1.4864171717e+01 5.4949827609e+01
1298662800: 1.7477263566e+01 5.3527972093e+01
1298663100: 1.5124398641e+01 5.4831037049e+01
1298663400: 1.7147537793e+01 5.5425190858e+01
1298663700: 1.5060354054e+01 5.4694206306e+01
1298664000: 1.7771512613e+01 5.5719682583e+01
1298664300: 1.5233822222e+01 5.6544511111e+01
1298664600: nan nan

NOT WORKING:
02/25/2011 03:00:01 PM - WEBLOG: Poller[0] CACTI2RRD: /usr/bin/rrdtool graph - --imgformat=PNG --start=1298577598 --end=1298663998 --title="il21921e - Traffic - Gi0/0" --rigid --base=1000 --height=120 --width=500 --alt-autoscale-max --lower-limit=0 COMMENT:"From 2011/02/24 14\:59\:58 To 2011/02/25 14\:59\:58\c" COMMENT:" \n" --vertical-label="bits per second" --slope-mode --font TITLE:12: --font AXIS:8: --font LEGEND:10: --font UNIT:8: DEF:a="/var/www/cacti/rra/il21921e_traffic_in_3225.rrd":traffic_in:AVERAGE DEF:b="/var/www/cacti/rra/il21921e_traffic_in_3225.rrd":traffic_in:MAX DEF:c="/var/www/cacti/rra/il21921e_traffic_in_3225.rrd":traffic_out:AVERAGE DEF:d="/var/www/cacti/rra/il21921e_traffic_in_3225.rrd":traffic_out:MAX CDEF:cdefa=a,8,* CDEF:cdefd=b,8,* CDEF:cdefh=c,8,* CDEF:cdefba=d,8,* CDEF:cdefbb=a,UN,INF,UNKN,IF CDEF:cdefbg=TIME,1298663701,GT,a,a,UN,0,a,IF,IF,TIME,1298663701,GT,c,c,UN,0,c,IF,IF,+ AREA:cdefa#157419FF:"In" GPRINT:cdefa:LAST:" Cur\:%8.2lf %s" GPRINT:cdefa:AVERAGE:"Av\:%8.2lf %s" GPRINT:cdefd:MAX:"Max\:%8.2lf %s" COMMENT:"Transfer\: 345.58 KB" HRULE:|95\:bits\:0\:current|#00FF00FF:"95%\:" COMMENT:"|95\:bits\:6\:current| mbit\n" LINE1:cdefh#002A8FFF:"Out" GPRINT:cdefh:LAST:"Cur\:%8.2lf %s" GPRINT:cdefh:AVERAGE:"Av\:%8.2lf %s" GPRINT:cdefba:MAX:"Max\:%8.2lf %s" AREA:cdefbb#8F9286FF:"" COMMENT:"Transfer\: 1.16 MB" HRULE:|95\:bits\:0\:current|#0000FFFF:"95%\:" COMMENT:"|95\:bits\:6\:current| mbit\n" COMMENT:"\n" HRULE:589.1#FF0000FF:"Total 95%\:" COMMENT:"0 mbit" COMMENT:"Total Transfer\: 1.51 MB\n"

rrdtool fetch il21921e_traffic_in_3225.rrd AVERAGE
...
1298660400: 1.7424373068e+01 5.3655554525e+01
1298660700: 1.5376251093e+01 5.5694811918e+01
1298661000: 1.6502509172e+01 5.2741433557e+01
1298661300: 1.4759400000e+01 5.5097333333e+01
1298661600: 1.7532666667e+01 5.4715466667e+01
1298661900: 1.5281436964e+01 5.5644232343e+01
1298662200: 1.7133191319e+01 5.3537140047e+01
1298662500: 1.4864171717e+01 5.4949827609e+01
1298662800: 1.7477263566e+01 5.3527972093e+01
1298663100: 1.5124398641e+01 5.4831037049e+01
1298663400: 1.7147537793e+01 5.5425190858e+01
1298663700: 1.5060354054e+01 5.4694206306e+01
1298664000: 1.7771512613e+01 5.5719682583e+01
1298664300: 1.5233822222e+01 5.6544511111e+01
1298664600: nan nan


I've tried deleting and re-creating the non-working graphs, I've removed and re-added the hosts. I cannot use Ping to determine the status of a host, only SNMP allows the status to change from "Unknown" to "Up". All unix Ping latency graphs work, a few interface graphs work, and every Cisco CPU graph I create works. I can query the interfaces via SNMP manually no problem. The data appears to be in the correct RRD file, it's getting updated regularly now, and the details of each individual graph appears to point to the correct file. I'm totally stumped as to why they simply won't plot - all the ones that aren't working are just a red "x".

Any ideas I can try would be appreciated.
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: poller not running, rrd's filling with nan

Post by gandalf »

What is "not working"? Do you have an error message? What happens, when running the given rrdtool graph statement? Or do you have a screenshot?
R.
snm77
Posts: 6
Joined: Wed Dec 09, 2009 4:45 pm

Re: poller not running, rrd's filling with nan

Post by snm77 »

Yes, sorry. I made the mistake of trying to describe what picture shows much better.
I went into graph management for a particular graph and turned on debug mode to take this screen shot
erroredgraph.jpg
erroredgraph.jpg (127.99 KiB) Viewed 1390 times
Here is the same host, but with a graph that plots:
normalgraph.jpg
normalgraph.jpg (168.45 KiB) Viewed 1390 times
I'll post anything more to help clarify this.

At this point, my poller seems to have recovered, i just can't re-create the same graphs I had before. Thank you very much for answering :)
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: poller not running, rrd's filling with nan

Post by gandalf »

Those screenshots show success, so they won't help debugging any issue not shown there
R.
Post Reply

Who is online

Users browsing this forum: No registered users and 2 guests