Abruptly stopped graphing - broken pipe to RRDtool?

Post general support questions here that do not specifically fall into the Linux or Windows categories.

Moderators: Developers, Moderators

Post Reply
egironda
Posts: 45
Joined: Mon Dec 19, 2005 6:44 pm

Abruptly stopped graphing - broken pipe to RRDtool?

Post by egironda »

A lot of my graphs have abruptly stopped updating again, or updating for only one poll, very intermittently.

As far as I can tell, the poller is successfully collecting the data, and it seems to be inserting it in the rrd, but the rrd never gets updated with valid data. This is not the first time this has happened. Last time, I moved the rrds out and let them be re-created, and it seemed to work again.

I am running fully patched, current code on Debian. I've cleared the poller cache, and also checked to make sure the poller_output table was zeroed.

mysql Ver 14.7 Distrib 4.1.11, for pc-linux-gnu (i386)
RRDtool 1.2.12 Copyright 1997-2005 by Tobias Oetiker <tobi@oetiker.ch>
CACTID 0.8.6h Copyright 2002-2006 by The Cacti Group
Cacti 0.8.6h

Some debug:

06/12/2006 10:55:11 AM - CACTID: Poller[0] Host[156] RECACHE: Processing 1 items in the auto reindex cache for 'sf-apc-emu1'
06/12/2006 10:55:11 AM - CACTID: Poller[0] Host[156] DS[1770] SNMP: v1: sf-apc-emu1, dsname: apc, oid: 1.3.6.1.4.1.318.1.1.2.1.1.0, value: 27
06/12/2006 10:55:11 AM - CACTID: Poller[0] Host[156] DS[1771] SNMP: v1: sf-apc-emu1, dsname: apc, oid: 1.3.6.1.4.1.318.1.1.2.1.2.0, value: 30
06/12/2006 10:55:11 AM - CACTID: Poller[0] DEBUG: MySQL Insert ID '453': 'INSERT INTO poller_output (local_data_id,rrd_name,time,output) VALUE (1770,'apc','2006-06-12 10:55:11','27'),(1771,'apc','2006-06-12 10:55:11','30')'
06/12/2006 10:55:11 AM - CMDPHP: Poller[0] DEBUG: SQL Exec: "delete from poller_output where local_data_id='1771' and rrd_name='apc' and time='2006-06-12 10:55:11'"
06/12/2006 10:55:11 AM - CMDPHP: Poller[0] DEBUG: SQL Exec: "delete from poller_output where local_data_id='1770' and rrd_name='apc' and time='2006-06-12 10:55:11'"

06/12/2006 10:55:11 AM - POLLER: Poller[0] CACTI2RRD: /usr/local/rrdtool-1.2.12/bin/rrdtool update /usr/share/cacti/site/rra/sfapcemu1_apc_1771.rrd --template apc 1150134911:30
06/12/2006 10:55:11 AM - POLLER: Poller[0] CACTI2RRD: /usr/local/rrdtool-1.2.12/bin/rrdtool update /usr/share/cacti/site/rra/sfapcemu1_apc_1770.rrd --template apc 1150134911:27

<!-- Round Robin Database Dump -->
<rrd>
<version> 0003 </version>
<step> 300 </step> <!-- Seconds -->
<lastupdate> 1150132511 </lastupdate> <!-- 2006-06-12 10:15:11 PDT -->

<ds>
<name> apc </name>
<type> GAUGE </type>
<minimal_heartbeat> 600 </minimal_heartbeat>
<min> 0.0000000000e+00 </min>
<max> NaN </max>

<!-- PDP Status -->
<last_ds> UNKN </last_ds>
<value> NaN </value>
<unknown_sec> 11 </unknown_sec>
</ds>

Any suggestions? This is making me crazy, and losing all historical data for 3000 rrds again is really a big problem for me.
Last edited by egironda on Tue Jun 13, 2006 3:28 pm, edited 1 time in total.
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Post by gandalf »

Please grep for "SYSTEM STATS" to see, wether runtime limit was exceeded.
Second, for that lot of rrd's it is recommended to increase memory setting in php.ini to at least 64M.
Third, please check that only one poller is running. See /etc/crontab, /etc/cron.d/cacti and crontabs of users root and cactiuser
Reinhard
egironda
Posts: 45
Joined: Mon Dec 19, 2005 6:44 pm

Post by egironda »

Thanks for the starting points, I think I've checked most of these though...

I changed the php memory some time ago, in both php/cgi/php.ini and php/cli/php.ini

;;;;;;;;;;;;;;;;;;;
; Resource Limits ;
;;;;;;;;;;;;;;;;;;;

max_execution_time = 120 ; Maximum execution time of each script, in seconds
max_input_time = 120 ; Maximum amount of time each script may spend parsing requ
est data
memory_limit = 128M ; Maximum amount of memory a script may consume (8MB)

The poller runtime was not exceeded.

06/12/2006 01:15:20 PM - SYSTEM STATS: Time:18.3541 Method:cactid Processes:1 Threads:20 Hosts:207 HostsPerProcess:207 DataSources:3876 RRDsProcessed:2668
06/12/2006 01:15:20 PM - EXPORT STATS: ExportTime:0.0017 TotalGraphs:0

I double checked that it is only in the /etc/cron.d/cacti crontab.

It seems like something gets broken between cacti and rrdtool, like the pipe isn't working right for some reason... I can't find anything that has changed on the system though.
egironda
Posts: 45
Joined: Mon Dec 19, 2005 6:44 pm

Post by egironda »

Some more data:

When I run poller.php from the commandline as my cacti user, I get the regular sorts of output, but this was unexpected:

tr: write error: Broken pipe
tr: write error: Broken pipe

That sounds like the culprit. Now just to find out why it's happening... any ideas very very very welcome!
egironda
Posts: 45
Joined: Mon Dec 19, 2005 6:44 pm

Post by egironda »

Another weird datapoint. Check out the way the number of RRDs processed fluctuates. The actual number of rrds in my directory is 3289, many of which have not been updated in a while and may just be stale and old... so where does 5242 come from?!

06/12/2006 02:25:27 PM - SYSTEM STATS: Time:26.4950 Method:cactid Processes:1 Threads:15 Hosts:209 HostsPerProcess:209 DataSources:3908 RRDsProcessed:2694

06/12/2006 02:25:27 PM - EXPORT STATS: ExportTime:0.0022 TotalGraphs:0
06/12/2006 02:28:45 PM - SYSTEM STATS: Time:26.4997 Method:cactid Processes:1 Threads:15 Hosts:209 HostsPerProcess:209 DataSources:3908 RRDsProcessed:2694

06/12/2006 02:28:45 PM - EXPORT STATS: ExportTime:0.0016 TotalGraphs:0
06/12/2006 02:29:48 PM - SYSTEM STATS: Time:28.5204 Method:cactid Processes:1 Threads:18 Hosts:209 HostsPerProcess:209 DataSources:3908 RRDsProcessed:2694
06/12/2006 02:29:48 PM - EXPORT STATS: ExportTime:0.0016 TotalGraphs:0

06/12/2006 02:30:35 PM - SYSTEM STATS: Time:33.6723 Method:cactid Processes:1 Threads:18 Hosts:209 HostsPerProcess:209 DataSources:3908 RRDsProcessed:2694
06/12/2006 02:30:35 PM - EXPORT STATS: ExportTime:0.0016 TotalGraphs:0

06/12/2006 02:35:37 PM - SYSTEM STATS: Time:35.0602 Method:cactid Processes:1 Threads:18 Hosts:209 HostsPerProcess:209 DataSources:3908 RRDsProcessed:5242
06/12/2006 02:35:37 PM - EXPORT STATS: ExportTime:0.0017 TotalGraphs:0

06/12/2006 02:39:47 PM - SYSTEM STATS: Time:36.1357 Method:cactid Processes:1 Threads:18 Hosts:209 HostsPerProcess:209 DataSources:3908 RRDsProcessed:3191
06/12/2006 02:39:47 PM - EXPORT STATS: ExportTime:0.0016 TotalGraphs:0

06/12/2006 02:40:32 PM - SYSTEM STATS: Time:30.5237 Method:cactid Processes:1 Threads:18 Hosts:209 HostsPerProcess:209 DataSources:3908 RRDsProcessed:2722
06/12/2006 02:40:32 PM - EXPORT STATS: ExportTime:0.0016 TotalGraphs:0
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Post by gandalf »

egironda wrote:tr: write error: Broken pipe
tr: write error: Broken pipe

That sounds like the culprit. Now just to find out why it's happening... any ideas very very very welcome!
I already though of this. Some time ago, there was a discussion on php and proken rrdtool pipe's. But I don't remember it and a short search/google didn't get it...
Reinhard
egironda
Posts: 45
Joined: Mon Dec 19, 2005 6:44 pm

Post by egironda »

Well... another mystery repaired but not solved.

I removed a bunch of stale rrd files that were sitting in the rrd directory... ones that hadn't been updated in several weeks, for whatever reason. Once I did that, it seemed to start up writing to files okay again.

My thought is that one of those files contained something broken that broke it in a bad way... I of course have no way of determining *what*, but if it's working for now...

I also downgraded from the SVN version of cactid (0.8.6h) to the release version (0.8.6g labelled 0.8.6f), just in case, but the graphing fixed itself before I did that...

If anyone has heard of anything like this, let me know... I'd love to find some way of preventing this from happening again.
Post Reply

Who is online

Users browsing this forum: No registered users and 0 guests