A lot of my graphs have abruptly stopped updating again, or updating for only one poll, very intermittently.
As far as I can tell, the poller is successfully collecting the data, and it seems to be inserting it in the rrd, but the rrd never gets updated with valid data. This is not the first time this has happened. Last time, I moved the rrds out and let them be re-created, and it seemed to work again.
I am running fully patched, current code on Debian. I've cleared the poller cache, and also checked to make sure the poller_output table was zeroed.
mysql Ver 14.7 Distrib 4.1.11, for pc-linux-gnu (i386)
RRDtool 1.2.12 Copyright 1997-2005 by Tobias Oetiker <tobi@oetiker.ch>
CACTID 0.8.6h Copyright 2002-2006 by The Cacti Group
Cacti 0.8.6h
Some debug:
06/12/2006 10:55:11 AM - CACTID: Poller[0] Host[156] RECACHE: Processing 1 items in the auto reindex cache for 'sf-apc-emu1'
06/12/2006 10:55:11 AM - CACTID: Poller[0] Host[156] DS[1770] SNMP: v1: sf-apc-emu1, dsname: apc, oid: 1.3.6.1.4.1.318.1.1.2.1.1.0, value: 27
06/12/2006 10:55:11 AM - CACTID: Poller[0] Host[156] DS[1771] SNMP: v1: sf-apc-emu1, dsname: apc, oid: 1.3.6.1.4.1.318.1.1.2.1.2.0, value: 30
06/12/2006 10:55:11 AM - CACTID: Poller[0] DEBUG: MySQL Insert ID '453': 'INSERT INTO poller_output (local_data_id,rrd_name,time,output) VALUE (1770,'apc','2006-06-12 10:55:11','27'),(1771,'apc','2006-06-12 10:55:11','30')'
06/12/2006 10:55:11 AM - CMDPHP: Poller[0] DEBUG: SQL Exec: "delete from poller_output where local_data_id='1771' and rrd_name='apc' and time='2006-06-12 10:55:11'"
06/12/2006 10:55:11 AM - CMDPHP: Poller[0] DEBUG: SQL Exec: "delete from poller_output where local_data_id='1770' and rrd_name='apc' and time='2006-06-12 10:55:11'"
06/12/2006 10:55:11 AM - POLLER: Poller[0] CACTI2RRD: /usr/local/rrdtool-1.2.12/bin/rrdtool update /usr/share/cacti/site/rra/sfapcemu1_apc_1771.rrd --template apc 1150134911:30
06/12/2006 10:55:11 AM - POLLER: Poller[0] CACTI2RRD: /usr/local/rrdtool-1.2.12/bin/rrdtool update /usr/share/cacti/site/rra/sfapcemu1_apc_1770.rrd --template apc 1150134911:27
<!-- Round Robin Database Dump -->
<rrd>
<version> 0003 </version>
<step> 300 </step> <!-- Seconds -->
<lastupdate> 1150132511 </lastupdate> <!-- 2006-06-12 10:15:11 PDT -->
<ds>
<name> apc </name>
<type> GAUGE </type>
<minimal_heartbeat> 600 </minimal_heartbeat>
<min> 0.0000000000e+00 </min>
<max> NaN </max>
<!-- PDP Status -->
<last_ds> UNKN </last_ds>
<value> NaN </value>
<unknown_sec> 11 </unknown_sec>
</ds>
Any suggestions? This is making me crazy, and losing all historical data for 3000 rrds again is really a big problem for me.
Abruptly stopped graphing - broken pipe to RRDtool?
Moderators: Developers, Moderators
Abruptly stopped graphing - broken pipe to RRDtool?
Last edited by egironda on Tue Jun 13, 2006 3:28 pm, edited 1 time in total.
- gandalf
- Developer
- Posts: 22383
- Joined: Thu Dec 02, 2004 2:46 am
- Location: Muenster, Germany
- Contact:
Please grep for "SYSTEM STATS" to see, wether runtime limit was exceeded.
Second, for that lot of rrd's it is recommended to increase memory setting in php.ini to at least 64M.
Third, please check that only one poller is running. See /etc/crontab, /etc/cron.d/cacti and crontabs of users root and cactiuser
Reinhard
Second, for that lot of rrd's it is recommended to increase memory setting in php.ini to at least 64M.
Third, please check that only one poller is running. See /etc/crontab, /etc/cron.d/cacti and crontabs of users root and cactiuser
Reinhard
Thanks for the starting points, I think I've checked most of these though...
I changed the php memory some time ago, in both php/cgi/php.ini and php/cli/php.ini
;;;;;;;;;;;;;;;;;;;
; Resource Limits ;
;;;;;;;;;;;;;;;;;;;
max_execution_time = 120 ; Maximum execution time of each script, in seconds
max_input_time = 120 ; Maximum amount of time each script may spend parsing requ
est data
memory_limit = 128M ; Maximum amount of memory a script may consume (8MB)
The poller runtime was not exceeded.
06/12/2006 01:15:20 PM - SYSTEM STATS: Time:18.3541 Method:cactid Processes:1 Threads:20 Hosts:207 HostsPerProcess:207 DataSources:3876 RRDsProcessed:2668
06/12/2006 01:15:20 PM - EXPORT STATS: ExportTime:0.0017 TotalGraphs:0
I double checked that it is only in the /etc/cron.d/cacti crontab.
It seems like something gets broken between cacti and rrdtool, like the pipe isn't working right for some reason... I can't find anything that has changed on the system though.
I changed the php memory some time ago, in both php/cgi/php.ini and php/cli/php.ini
;;;;;;;;;;;;;;;;;;;
; Resource Limits ;
;;;;;;;;;;;;;;;;;;;
max_execution_time = 120 ; Maximum execution time of each script, in seconds
max_input_time = 120 ; Maximum amount of time each script may spend parsing requ
est data
memory_limit = 128M ; Maximum amount of memory a script may consume (8MB)
The poller runtime was not exceeded.
06/12/2006 01:15:20 PM - SYSTEM STATS: Time:18.3541 Method:cactid Processes:1 Threads:20 Hosts:207 HostsPerProcess:207 DataSources:3876 RRDsProcessed:2668
06/12/2006 01:15:20 PM - EXPORT STATS: ExportTime:0.0017 TotalGraphs:0
I double checked that it is only in the /etc/cron.d/cacti crontab.
It seems like something gets broken between cacti and rrdtool, like the pipe isn't working right for some reason... I can't find anything that has changed on the system though.
Some more data:
When I run poller.php from the commandline as my cacti user, I get the regular sorts of output, but this was unexpected:
tr: write error: Broken pipe
tr: write error: Broken pipe
That sounds like the culprit. Now just to find out why it's happening... any ideas very very very welcome!
When I run poller.php from the commandline as my cacti user, I get the regular sorts of output, but this was unexpected:
tr: write error: Broken pipe
tr: write error: Broken pipe
That sounds like the culprit. Now just to find out why it's happening... any ideas very very very welcome!
Another weird datapoint. Check out the way the number of RRDs processed fluctuates. The actual number of rrds in my directory is 3289, many of which have not been updated in a while and may just be stale and old... so where does 5242 come from?!
06/12/2006 02:25:27 PM - SYSTEM STATS: Time:26.4950 Method:cactid Processes:1 Threads:15 Hosts:209 HostsPerProcess:209 DataSources:3908 RRDsProcessed:2694
06/12/2006 02:25:27 PM - EXPORT STATS: ExportTime:0.0022 TotalGraphs:0
06/12/2006 02:28:45 PM - SYSTEM STATS: Time:26.4997 Method:cactid Processes:1 Threads:15 Hosts:209 HostsPerProcess:209 DataSources:3908 RRDsProcessed:2694
06/12/2006 02:28:45 PM - EXPORT STATS: ExportTime:0.0016 TotalGraphs:0
06/12/2006 02:29:48 PM - SYSTEM STATS: Time:28.5204 Method:cactid Processes:1 Threads:18 Hosts:209 HostsPerProcess:209 DataSources:3908 RRDsProcessed:2694
06/12/2006 02:29:48 PM - EXPORT STATS: ExportTime:0.0016 TotalGraphs:0
06/12/2006 02:30:35 PM - SYSTEM STATS: Time:33.6723 Method:cactid Processes:1 Threads:18 Hosts:209 HostsPerProcess:209 DataSources:3908 RRDsProcessed:2694
06/12/2006 02:30:35 PM - EXPORT STATS: ExportTime:0.0016 TotalGraphs:0
06/12/2006 02:35:37 PM - SYSTEM STATS: Time:35.0602 Method:cactid Processes:1 Threads:18 Hosts:209 HostsPerProcess:209 DataSources:3908 RRDsProcessed:5242
06/12/2006 02:35:37 PM - EXPORT STATS: ExportTime:0.0017 TotalGraphs:0
06/12/2006 02:39:47 PM - SYSTEM STATS: Time:36.1357 Method:cactid Processes:1 Threads:18 Hosts:209 HostsPerProcess:209 DataSources:3908 RRDsProcessed:3191
06/12/2006 02:39:47 PM - EXPORT STATS: ExportTime:0.0016 TotalGraphs:0
06/12/2006 02:40:32 PM - SYSTEM STATS: Time:30.5237 Method:cactid Processes:1 Threads:18 Hosts:209 HostsPerProcess:209 DataSources:3908 RRDsProcessed:2722
06/12/2006 02:40:32 PM - EXPORT STATS: ExportTime:0.0016 TotalGraphs:0
06/12/2006 02:25:27 PM - SYSTEM STATS: Time:26.4950 Method:cactid Processes:1 Threads:15 Hosts:209 HostsPerProcess:209 DataSources:3908 RRDsProcessed:2694
06/12/2006 02:25:27 PM - EXPORT STATS: ExportTime:0.0022 TotalGraphs:0
06/12/2006 02:28:45 PM - SYSTEM STATS: Time:26.4997 Method:cactid Processes:1 Threads:15 Hosts:209 HostsPerProcess:209 DataSources:3908 RRDsProcessed:2694
06/12/2006 02:28:45 PM - EXPORT STATS: ExportTime:0.0016 TotalGraphs:0
06/12/2006 02:29:48 PM - SYSTEM STATS: Time:28.5204 Method:cactid Processes:1 Threads:18 Hosts:209 HostsPerProcess:209 DataSources:3908 RRDsProcessed:2694
06/12/2006 02:29:48 PM - EXPORT STATS: ExportTime:0.0016 TotalGraphs:0
06/12/2006 02:30:35 PM - SYSTEM STATS: Time:33.6723 Method:cactid Processes:1 Threads:18 Hosts:209 HostsPerProcess:209 DataSources:3908 RRDsProcessed:2694
06/12/2006 02:30:35 PM - EXPORT STATS: ExportTime:0.0016 TotalGraphs:0
06/12/2006 02:35:37 PM - SYSTEM STATS: Time:35.0602 Method:cactid Processes:1 Threads:18 Hosts:209 HostsPerProcess:209 DataSources:3908 RRDsProcessed:5242
06/12/2006 02:35:37 PM - EXPORT STATS: ExportTime:0.0017 TotalGraphs:0
06/12/2006 02:39:47 PM - SYSTEM STATS: Time:36.1357 Method:cactid Processes:1 Threads:18 Hosts:209 HostsPerProcess:209 DataSources:3908 RRDsProcessed:3191
06/12/2006 02:39:47 PM - EXPORT STATS: ExportTime:0.0016 TotalGraphs:0
06/12/2006 02:40:32 PM - SYSTEM STATS: Time:30.5237 Method:cactid Processes:1 Threads:18 Hosts:209 HostsPerProcess:209 DataSources:3908 RRDsProcessed:2722
06/12/2006 02:40:32 PM - EXPORT STATS: ExportTime:0.0016 TotalGraphs:0
- gandalf
- Developer
- Posts: 22383
- Joined: Thu Dec 02, 2004 2:46 am
- Location: Muenster, Germany
- Contact:
I already though of this. Some time ago, there was a discussion on php and proken rrdtool pipe's. But I don't remember it and a short search/google didn't get it...egironda wrote:tr: write error: Broken pipe
tr: write error: Broken pipe
That sounds like the culprit. Now just to find out why it's happening... any ideas very very very welcome!
Reinhard
Well... another mystery repaired but not solved.
I removed a bunch of stale rrd files that were sitting in the rrd directory... ones that hadn't been updated in several weeks, for whatever reason. Once I did that, it seemed to start up writing to files okay again.
My thought is that one of those files contained something broken that broke it in a bad way... I of course have no way of determining *what*, but if it's working for now...
I also downgraded from the SVN version of cactid (0.8.6h) to the release version (0.8.6g labelled 0.8.6f), just in case, but the graphing fixed itself before I did that...
If anyone has heard of anything like this, let me know... I'd love to find some way of preventing this from happening again.
I removed a bunch of stale rrd files that were sitting in the rrd directory... ones that hadn't been updated in several weeks, for whatever reason. Once I did that, it seemed to start up writing to files okay again.
My thought is that one of those files contained something broken that broke it in a bad way... I of course have no way of determining *what*, but if it's working for now...
I also downgraded from the SVN version of cactid (0.8.6h) to the release version (0.8.6g labelled 0.8.6f), just in case, but the graphing fixed itself before I did that...
If anyone has heard of anything like this, let me know... I'd love to find some way of preventing this from happening again.
Who is online
Users browsing this forum: No registered users and 1 guest