Cacti suddenly stopped graphing
Moderators: Developers, Moderators
Cacti suddenly stopped graphing
Hi
I came in this morning and my Cacti server had stopped graphing about 5am this morning.
I have followed the debugging guides and permissions debuging guides but I'm not able to fix the issue. It looks the poller is taking too long to run.
07/12/2010 12:29:00 PM - POLLER: Poller[0] WARNING: Poller Output Table not Empty. Issues Found: 2822, Data Sources: memoryAlloc(DS[113]), NS_Int_traffic_in(DS[118]), NS_Int_traffic_out(DS[118]), NS_Int_traffic_in(DS[119]), NS_Int_traffic_out(DS[119]), NS_Int_traffic_in(DS[120]), NS_Int_traffic_out(DS[120]), cpu5(DS[121]), sessionsFail(DS[126]), NS_Int_traffic_in(DS[127]), NS_Int_traffic_out(DS[127]), NS_Int_traffic_in(DS[128]), NS_Int_traffic_out(DS[128]), sessionsAlloc(DS[133]), NS_Int_traffic_in(DS[135]), NS_Int_traffic_out(DS[135]), NS_Int_traffic_in(DS[136]), NS_Int_traffic_out(DS[136]), NS_Int_traffic_in(DS[137]), NS_Int_traffic_out(DS[137]), memoryFree(DS[141]), Additional Issues Remain. Only showing first 20
07/12/2010 12:29:00 PM - POLLER: Poller[0] WARNING: Scheduled Task is out of sync with the Poller Interval! The Poller Interval is '60' seconds, with a maximum of a '300' second Scheduled Task, but 598 seconds have passed since the last poll!
07/12/2010 12:23:42 PM - SYSTEM STATS: Time:280.5344 Method:spine Processes:2 Threads:15 Hosts:394 HostsPerProcess:197 DataSources:2890 RRDsProcessed:1381
07/12/2010 12:19:02 PM - POLLER: Poller[0] WARNING: Scheduled Task is out of sync with the Poller Interval! The Poller Interval is '60' seconds, with a maximum of a '300' second Scheduled Task, but 600 seconds have passed since the last poll!
07/12/2010 12:11:07 PM - SYSTEM STATS: Time:124.9352 Method:spine Processes:2 Threads:15 Hosts:394 HostsPerProcess:197 DataSources:2912 RRDsProcessed:1380
07/12/2010 12:09:02 PM - POLLER: Poller[0] WARNING: Scheduled Task is out of sync with the Poller Interval! The Poller Interval is '60' seconds, with a maximum of a '300' second Scheduled Task, but 602 seconds have passed since the last poll!
07/12/2010 11:59:01 AM - POLLER: Poller[0] WARNING: Poller Output Table not Empty. Issues Found: 2930, Data Sources: ChassisFan(DS[14]), bigip_traffic_in(DS[16]), bigip_traffic_out(DS[16]), bigip_traffic_in(DS[17]), bigip_traffic_out(DS[17]), sysCpuTemperature(DS[21]), sysHostMemoryUsed(DS[35]), ChassisFan(DS[40]), bigip_traffic_in(DS[44]), bigip_traffic_out(DS[44]), bigip_traffic_in(DS[45]), bigip_traffic_out(DS[45]), TmIdleCycles(DS[47]), TmSleepCycles(DS[47]), TmTotalCycles(DS[47]), memoryAlloc(DS[68]), NS_Int_traffic_in(DS[73]), NS_Int_traffic_out(DS[73]), NS_Int_traffic_in(DS[74]), NS_Int_traffic_out(DS[74]), NS_Int_traffic_in(DS[75]), Additional Issues Remain. Only showing first 20
07/12/2010 11:59:01 AM - POLLER: Poller[0] WARNING: Scheduled Task is out of sync with the Poller Interval! The Poller Interval is '60' seconds, with a maximum of a '300' second Scheduled Task, but 707 seconds have passed since the last poll!
07/12/2010 11:42:50 AM - POLLER: Poller[0] WARNING: Poller Output Table not Empty. Issues Found: 5811, Data Sources: ChassisFan(DS[13]), ChassisFan(DS[14]), bigip_traffic_in(DS[16]), bigip_traffic_in(DS[16]), bigip_traffic_out(DS[16]), bigip_traffic_out(DS[16]), bigip_traffic_in(DS[17]), bigip_traffic_in(DS[17]), bigip_traffic_out(DS[17]), bigip_traffic_out(DS[17]), TmIdleCycles(DS[20]), TmSleepCycles(DS[20]), TmTotalCycles(DS[20]), sysCpuTemperature(DS[21]), sysHostMemoryTotal(DS[34]), sysHostMemoryUsed(DS[35]), sysHostMemoryUsed(DS[39]), ChassisFan(DS[40]), bigip_traffic_in(DS[44]), bigip_traffic_in(DS[44]), bigip_traffic_out(DS[44]), Additional Issues Remain. Only showing first 20
07/12/2010 11:42:49 AM - POLLER: Poller[0] WARNING: Scheduled Task is out of sync with the Poller Interval! The Poller Interval is '60' seconds, with a maximum of a '300' second Scheduled Task, but 1019 seconds have passed since the last poll!
07/12/2010 11:27:18 AM - WEBUI: Cacti Log Cleared from Web Management Interface
I have cleared the poller cache. I have stopped and restarted the poller within Cacti and also the scheduled task. I have tried running it manually but it seems to take ages.
I have checked permissions and not getting any denied messages.
I have checked poller output table. I have cleared it but I have also seen it populate and then clear itself.
mysql> select count(*) from poller_output;
+----------+
| count(*) |
+----------+
| 2681 |
+----------+
1 row in set (0.02 sec)
mysql> select count(*) from poller_output;
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (0.00 sec)
I have also checked the mysql DB and no issues there, I did a repair anyway just to be sure.
I have run a debug against one host and I can't see any errors.
07/12/2010 12:31:43 PM - SPINE: Poller[0] Time: 1.2180 s, Threads: 15, Hosts: 2
07/12/2010 12:31:43 PM - SPINE: Poller[0] DEBUG: MYSQL Free & Close Completed
07/12/2010 12:31:43 PM - SPINE: Poller[0] DEBUG: Allocated Variable Memory Freed
07/12/2010 12:31:43 PM - SPINE: Poller[0] DEBUG: PHP Script Server Pipes Closed
07/12/2010 12:31:43 PM - SPINE: Poller[0] DEBUG: Thread Cleanup Complete
07/12/2010 12:31:43 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 0
07/12/2010 12:31:43 PM - SPINE: Poller[0] Host[20] DEBUG: HOST COMPLETE: About to Exit Host Polling Thread Function
07/12/2010 12:31:43 PM - SPINE: Poller[0] Host[20] DS[185] SNMP: v2: 172.27.212.121, dsname: NS_Int_traffic_in, oid: .1.3.6.1.4.1.3224.9.3.1.3.2, value: 282041294
07/12/2010 12:31:42 PM - SPINE: Poller[0] Host[20] DS[185] SNMP: v2: 172.27.212.121, dsname: NS_Int_traffic_out, oid: .1.3.6.1.4.1.3224.9.3.1.5.2, value: 2817055871
07/12/2010 12:31:42 PM - SPINE: Poller[0] Host[20] DS[184] SNMP: v2: 172.27.212.121, dsname: NS_Int_traffic_in, oid: .1.3.6.1.4.1.3224.9.3.1.3.0, value: 936302330
07/12/2010 12:31:42 PM - SPINE: Poller[0] Host[20] DS[184] SNMP: v2: 172.27.212.121, dsname: NS_Int_traffic_out, oid: .1.3.6.1.4.1.3224.9.3.1.5.0, value: 33187676
07/12/2010 12:31:42 PM - SPINE: Poller[0] Host[20] DS[183] SNMP: v2: 172.27.212.121, dsname: sessionsFail, oid: .1.3.6.1.4.1.3224.16.3.4.0, value: 0
07/12/2010 12:31:42 PM - SPINE: Poller[0] Host[20] DS[178] SNMP: v2: 172.27.212.121, dsname: cpu5, oid: .1.3.6.1.4.1.3224.16.1.3.0, value: 2
07/12/2010 12:31:42 PM - SPINE: Poller[0] Host[20] NOTE: There are '6' Polling Items for this Host
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 1
07/12/2010 12:31:42 PM - SPINE: Poller[0] Host[20] RECACHE: Processing 1 items in the auto reindex cache for '172.27.212.121'
07/12/2010 12:31:42 PM - SPINE: Poller[0] Host[20] SNMP Result: Host responded to SNMP
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 2
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: Valid Thread to be Created
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: In Poller, About to Start Polling of Host
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 1
07/12/2010 12:31:42 PM - SPINE: Poller[0] Host[0] DEBUG: HOST COMPLETE: About to Exit Host Polling Thread Function
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: In Poller, About to Start Polling of Host
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: Valid Thread to be Created
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: Initial Value of Active Threads is 0
07/12/2010 12:31:42 PM - SPINE: Poller[0] SPINE: Initializing PHP Script Server(s)
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: SNMP Library Version is 5.4.1
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: SNMP Header Version is 5.4.1
07/12/2010 12:31:42 PM - SPINE: Poller[0] SPINE: Initializing Net-SNMP API
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: MySQL is Thread Safe!
07/12/2010 12:31:42 PM - SPINE: Poller[0] Version 0.8.7e starting
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The Maximum SNMP OID Get Size is 30
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The PHP Script Server is Not Required
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: StartHost='20', EndHost='20', TotalPHPScripts='0'
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The number of php script servers to run is 4
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The script timeout is 60
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The number of concurrent processes is 2
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The polling interval is 60 seconds
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The threads variable is 15
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The log_pstats variable is 0
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The boost_redirect variable is 0
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The log_pwarn variable is 0
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The log_perror variable is 1
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The snmp_retries variable is 3
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The ping_timeout variable is 500
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The ping_retries variable is 1
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The ping_method variable is 1
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The ping_failure_count variable is 2
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The ping_recovery_count variable is 3
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The availability_method variable is 2
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The path_php variable is f:/php/php.exe
I have also tried the following command
F:\spine>f:\php\php.exe -q f:\apache2\htdocs\cacti\poller.php --force
07/12/2010 01:04:55 PM - POLLER: Poller[0] WARNING: Scheduled Task is out of syn
c with the Poller Interval! The Poller Interval is '60' seconds, with a maximum
of a '300' second Scheduled Task, but 378 seconds have passed since the last po
ll!
07/12/2010 01:09:20 PM - SYSTEM STATS: Time:265.1056 Method:spine Processes:1 Th
reads:15 Hosts:394 HostsPerProcess:394 DataSources:2930 RRDsProcessed:1380
I have also seen these messages in the logs.
07/12/2010 12:49:09 PM - SPINE: Poller[0] ERROR: SQL Failed! Error:'1062', Message:'Duplicate entry '2899-numnds-2010-07-12 12:49:09' for key 1', SQL Fragment:'INSERT INTO poller_output (local_data_id, rrd_name, time, output) VALUES (2899,'numnds','2010-07-12 12:49:09','257'),(2899,'sum2pds','2010-07-12 12:49:09','359'),(2899,'sum2nds','2010-07-12 12:49:09','349'),(2899,'numpsd','2010-07-12 12:49:09','283'),(2899,'sumpds','2010-07-12 12:49:09','277'),(2899,'sumnsd','2010-07-12 12:49:09','533'),(2899,'numpds','2010-07-12 12:49:09','251'),(2899,'sum2psd','2010-07-12 12:49:09','2535'),(2899,'numnsd','2010-07-12 12:49:09','292'),(2899,'sum2nsd','2010-07-12 12:49:09','2391'),(2899,'sumnds','2010-07-12 12:49:09','277'),(2899,'sumpsd','2010-07-12 12:49:09','533'),(2901,'maxpsd','2010-07-12 12:49:09','23'),(2901,'maxnds','2010-07-12 12:49:09','6'),(2901,'maxnsd','2010-07-12 12:49:09','20'),(2901,'maxpds','2010-07-12 12:49:09','5'),(2902,'Late','2010-07-12 12:49:09','0'),(2902,'lossDS','2010-07-12 12:49:09','0'),(2902,'OOS','2010-07-12 12:49:09','0'),(2902,'MIA','2010-07-12 12:49:09','0'),(2902,'lossSD','2010-07-12 12:49:09','0'),(2902,'rtt','2010-07-12 12:49:09','36'),(290'
I have changed to use 1 poller instead of 2
I have disabled all devices that will potentially use a slow script, I.e the CBWFQ PHP one and the NBAR one. Still hasn't solved the problem.
RRD files are updating.
Has anyone any ideas please?
If you require any further information let me know.
Cheers
Jay
I came in this morning and my Cacti server had stopped graphing about 5am this morning.
I have followed the debugging guides and permissions debuging guides but I'm not able to fix the issue. It looks the poller is taking too long to run.
07/12/2010 12:29:00 PM - POLLER: Poller[0] WARNING: Poller Output Table not Empty. Issues Found: 2822, Data Sources: memoryAlloc(DS[113]), NS_Int_traffic_in(DS[118]), NS_Int_traffic_out(DS[118]), NS_Int_traffic_in(DS[119]), NS_Int_traffic_out(DS[119]), NS_Int_traffic_in(DS[120]), NS_Int_traffic_out(DS[120]), cpu5(DS[121]), sessionsFail(DS[126]), NS_Int_traffic_in(DS[127]), NS_Int_traffic_out(DS[127]), NS_Int_traffic_in(DS[128]), NS_Int_traffic_out(DS[128]), sessionsAlloc(DS[133]), NS_Int_traffic_in(DS[135]), NS_Int_traffic_out(DS[135]), NS_Int_traffic_in(DS[136]), NS_Int_traffic_out(DS[136]), NS_Int_traffic_in(DS[137]), NS_Int_traffic_out(DS[137]), memoryFree(DS[141]), Additional Issues Remain. Only showing first 20
07/12/2010 12:29:00 PM - POLLER: Poller[0] WARNING: Scheduled Task is out of sync with the Poller Interval! The Poller Interval is '60' seconds, with a maximum of a '300' second Scheduled Task, but 598 seconds have passed since the last poll!
07/12/2010 12:23:42 PM - SYSTEM STATS: Time:280.5344 Method:spine Processes:2 Threads:15 Hosts:394 HostsPerProcess:197 DataSources:2890 RRDsProcessed:1381
07/12/2010 12:19:02 PM - POLLER: Poller[0] WARNING: Scheduled Task is out of sync with the Poller Interval! The Poller Interval is '60' seconds, with a maximum of a '300' second Scheduled Task, but 600 seconds have passed since the last poll!
07/12/2010 12:11:07 PM - SYSTEM STATS: Time:124.9352 Method:spine Processes:2 Threads:15 Hosts:394 HostsPerProcess:197 DataSources:2912 RRDsProcessed:1380
07/12/2010 12:09:02 PM - POLLER: Poller[0] WARNING: Scheduled Task is out of sync with the Poller Interval! The Poller Interval is '60' seconds, with a maximum of a '300' second Scheduled Task, but 602 seconds have passed since the last poll!
07/12/2010 11:59:01 AM - POLLER: Poller[0] WARNING: Poller Output Table not Empty. Issues Found: 2930, Data Sources: ChassisFan(DS[14]), bigip_traffic_in(DS[16]), bigip_traffic_out(DS[16]), bigip_traffic_in(DS[17]), bigip_traffic_out(DS[17]), sysCpuTemperature(DS[21]), sysHostMemoryUsed(DS[35]), ChassisFan(DS[40]), bigip_traffic_in(DS[44]), bigip_traffic_out(DS[44]), bigip_traffic_in(DS[45]), bigip_traffic_out(DS[45]), TmIdleCycles(DS[47]), TmSleepCycles(DS[47]), TmTotalCycles(DS[47]), memoryAlloc(DS[68]), NS_Int_traffic_in(DS[73]), NS_Int_traffic_out(DS[73]), NS_Int_traffic_in(DS[74]), NS_Int_traffic_out(DS[74]), NS_Int_traffic_in(DS[75]), Additional Issues Remain. Only showing first 20
07/12/2010 11:59:01 AM - POLLER: Poller[0] WARNING: Scheduled Task is out of sync with the Poller Interval! The Poller Interval is '60' seconds, with a maximum of a '300' second Scheduled Task, but 707 seconds have passed since the last poll!
07/12/2010 11:42:50 AM - POLLER: Poller[0] WARNING: Poller Output Table not Empty. Issues Found: 5811, Data Sources: ChassisFan(DS[13]), ChassisFan(DS[14]), bigip_traffic_in(DS[16]), bigip_traffic_in(DS[16]), bigip_traffic_out(DS[16]), bigip_traffic_out(DS[16]), bigip_traffic_in(DS[17]), bigip_traffic_in(DS[17]), bigip_traffic_out(DS[17]), bigip_traffic_out(DS[17]), TmIdleCycles(DS[20]), TmSleepCycles(DS[20]), TmTotalCycles(DS[20]), sysCpuTemperature(DS[21]), sysHostMemoryTotal(DS[34]), sysHostMemoryUsed(DS[35]), sysHostMemoryUsed(DS[39]), ChassisFan(DS[40]), bigip_traffic_in(DS[44]), bigip_traffic_in(DS[44]), bigip_traffic_out(DS[44]), Additional Issues Remain. Only showing first 20
07/12/2010 11:42:49 AM - POLLER: Poller[0] WARNING: Scheduled Task is out of sync with the Poller Interval! The Poller Interval is '60' seconds, with a maximum of a '300' second Scheduled Task, but 1019 seconds have passed since the last poll!
07/12/2010 11:27:18 AM - WEBUI: Cacti Log Cleared from Web Management Interface
I have cleared the poller cache. I have stopped and restarted the poller within Cacti and also the scheduled task. I have tried running it manually but it seems to take ages.
I have checked permissions and not getting any denied messages.
I have checked poller output table. I have cleared it but I have also seen it populate and then clear itself.
mysql> select count(*) from poller_output;
+----------+
| count(*) |
+----------+
| 2681 |
+----------+
1 row in set (0.02 sec)
mysql> select count(*) from poller_output;
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (0.00 sec)
I have also checked the mysql DB and no issues there, I did a repair anyway just to be sure.
I have run a debug against one host and I can't see any errors.
07/12/2010 12:31:43 PM - SPINE: Poller[0] Time: 1.2180 s, Threads: 15, Hosts: 2
07/12/2010 12:31:43 PM - SPINE: Poller[0] DEBUG: MYSQL Free & Close Completed
07/12/2010 12:31:43 PM - SPINE: Poller[0] DEBUG: Allocated Variable Memory Freed
07/12/2010 12:31:43 PM - SPINE: Poller[0] DEBUG: PHP Script Server Pipes Closed
07/12/2010 12:31:43 PM - SPINE: Poller[0] DEBUG: Thread Cleanup Complete
07/12/2010 12:31:43 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 0
07/12/2010 12:31:43 PM - SPINE: Poller[0] Host[20] DEBUG: HOST COMPLETE: About to Exit Host Polling Thread Function
07/12/2010 12:31:43 PM - SPINE: Poller[0] Host[20] DS[185] SNMP: v2: 172.27.212.121, dsname: NS_Int_traffic_in, oid: .1.3.6.1.4.1.3224.9.3.1.3.2, value: 282041294
07/12/2010 12:31:42 PM - SPINE: Poller[0] Host[20] DS[185] SNMP: v2: 172.27.212.121, dsname: NS_Int_traffic_out, oid: .1.3.6.1.4.1.3224.9.3.1.5.2, value: 2817055871
07/12/2010 12:31:42 PM - SPINE: Poller[0] Host[20] DS[184] SNMP: v2: 172.27.212.121, dsname: NS_Int_traffic_in, oid: .1.3.6.1.4.1.3224.9.3.1.3.0, value: 936302330
07/12/2010 12:31:42 PM - SPINE: Poller[0] Host[20] DS[184] SNMP: v2: 172.27.212.121, dsname: NS_Int_traffic_out, oid: .1.3.6.1.4.1.3224.9.3.1.5.0, value: 33187676
07/12/2010 12:31:42 PM - SPINE: Poller[0] Host[20] DS[183] SNMP: v2: 172.27.212.121, dsname: sessionsFail, oid: .1.3.6.1.4.1.3224.16.3.4.0, value: 0
07/12/2010 12:31:42 PM - SPINE: Poller[0] Host[20] DS[178] SNMP: v2: 172.27.212.121, dsname: cpu5, oid: .1.3.6.1.4.1.3224.16.1.3.0, value: 2
07/12/2010 12:31:42 PM - SPINE: Poller[0] Host[20] NOTE: There are '6' Polling Items for this Host
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 1
07/12/2010 12:31:42 PM - SPINE: Poller[0] Host[20] RECACHE: Processing 1 items in the auto reindex cache for '172.27.212.121'
07/12/2010 12:31:42 PM - SPINE: Poller[0] Host[20] SNMP Result: Host responded to SNMP
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 2
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: Valid Thread to be Created
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: In Poller, About to Start Polling of Host
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 1
07/12/2010 12:31:42 PM - SPINE: Poller[0] Host[0] DEBUG: HOST COMPLETE: About to Exit Host Polling Thread Function
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: In Poller, About to Start Polling of Host
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: Valid Thread to be Created
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: Initial Value of Active Threads is 0
07/12/2010 12:31:42 PM - SPINE: Poller[0] SPINE: Initializing PHP Script Server(s)
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: SNMP Library Version is 5.4.1
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: SNMP Header Version is 5.4.1
07/12/2010 12:31:42 PM - SPINE: Poller[0] SPINE: Initializing Net-SNMP API
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: MySQL is Thread Safe!
07/12/2010 12:31:42 PM - SPINE: Poller[0] Version 0.8.7e starting
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The Maximum SNMP OID Get Size is 30
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The PHP Script Server is Not Required
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: StartHost='20', EndHost='20', TotalPHPScripts='0'
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The number of php script servers to run is 4
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The script timeout is 60
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The number of concurrent processes is 2
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The polling interval is 60 seconds
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The threads variable is 15
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The log_pstats variable is 0
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The boost_redirect variable is 0
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The log_pwarn variable is 0
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The log_perror variable is 1
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The snmp_retries variable is 3
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The ping_timeout variable is 500
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The ping_retries variable is 1
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The ping_method variable is 1
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The ping_failure_count variable is 2
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The ping_recovery_count variable is 3
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The availability_method variable is 2
07/12/2010 12:31:42 PM - SPINE: Poller[0] DEBUG: The path_php variable is f:/php/php.exe
I have also tried the following command
F:\spine>f:\php\php.exe -q f:\apache2\htdocs\cacti\poller.php --force
07/12/2010 01:04:55 PM - POLLER: Poller[0] WARNING: Scheduled Task is out of syn
c with the Poller Interval! The Poller Interval is '60' seconds, with a maximum
of a '300' second Scheduled Task, but 378 seconds have passed since the last po
ll!
07/12/2010 01:09:20 PM - SYSTEM STATS: Time:265.1056 Method:spine Processes:1 Th
reads:15 Hosts:394 HostsPerProcess:394 DataSources:2930 RRDsProcessed:1380
I have also seen these messages in the logs.
07/12/2010 12:49:09 PM - SPINE: Poller[0] ERROR: SQL Failed! Error:'1062', Message:'Duplicate entry '2899-numnds-2010-07-12 12:49:09' for key 1', SQL Fragment:'INSERT INTO poller_output (local_data_id, rrd_name, time, output) VALUES (2899,'numnds','2010-07-12 12:49:09','257'),(2899,'sum2pds','2010-07-12 12:49:09','359'),(2899,'sum2nds','2010-07-12 12:49:09','349'),(2899,'numpsd','2010-07-12 12:49:09','283'),(2899,'sumpds','2010-07-12 12:49:09','277'),(2899,'sumnsd','2010-07-12 12:49:09','533'),(2899,'numpds','2010-07-12 12:49:09','251'),(2899,'sum2psd','2010-07-12 12:49:09','2535'),(2899,'numnsd','2010-07-12 12:49:09','292'),(2899,'sum2nsd','2010-07-12 12:49:09','2391'),(2899,'sumnds','2010-07-12 12:49:09','277'),(2899,'sumpsd','2010-07-12 12:49:09','533'),(2901,'maxpsd','2010-07-12 12:49:09','23'),(2901,'maxnds','2010-07-12 12:49:09','6'),(2901,'maxnsd','2010-07-12 12:49:09','20'),(2901,'maxpds','2010-07-12 12:49:09','5'),(2902,'Late','2010-07-12 12:49:09','0'),(2902,'lossDS','2010-07-12 12:49:09','0'),(2902,'OOS','2010-07-12 12:49:09','0'),(2902,'MIA','2010-07-12 12:49:09','0'),(2902,'lossSD','2010-07-12 12:49:09','0'),(2902,'rtt','2010-07-12 12:49:09','36'),(290'
I have changed to use 1 poller instead of 2
I have disabled all devices that will potentially use a slow script, I.e the CBWFQ PHP one and the NBAR one. Still hasn't solved the problem.
RRD files are updating.
Has anyone any ideas please?
If you require any further information let me know.
Cheers
Jay
Cacti Version 0.8.7e, Spine 0.8.7e, Apache 2.2.15, Mysql 5.0.88, PHP 5.2.13, RRDTool 1.2.30, NET-SNMP 5.5
Quad Core AMD Opteron Processor 2384, 2.70Ghz, 2GB RAM , 1 CPU used
Windows Server 2003 (X64), VMWARE ESX
Plugins: Aggregate 0.75
SYSTEM STATS: Time:12.5140 Method:spine Processes:2 Threads:15 Hosts:400 HostsPerProcess:200 DataSources:2909 RRDsProcessed:1384
Quad Core AMD Opteron Processor 2384, 2.70Ghz, 2GB RAM , 1 CPU used
Windows Server 2003 (X64), VMWARE ESX
Plugins: Aggregate 0.75
SYSTEM STATS: Time:12.5140 Method:spine Processes:2 Threads:15 Hosts:400 HostsPerProcess:200 DataSources:2909 RRDsProcessed:1384
I have just restored my DB to last weeks backup but I'm still seeing the same issue. Something must be causing the poller to get out of sync and take ages to run I just don't know what
Cacti Version 0.8.7e, Spine 0.8.7e, Apache 2.2.15, Mysql 5.0.88, PHP 5.2.13, RRDTool 1.2.30, NET-SNMP 5.5
Quad Core AMD Opteron Processor 2384, 2.70Ghz, 2GB RAM , 1 CPU used
Windows Server 2003 (X64), VMWARE ESX
Plugins: Aggregate 0.75
SYSTEM STATS: Time:12.5140 Method:spine Processes:2 Threads:15 Hosts:400 HostsPerProcess:200 DataSources:2909 RRDsProcessed:1384
Quad Core AMD Opteron Processor 2384, 2.70Ghz, 2GB RAM , 1 CPU used
Windows Server 2003 (X64), VMWARE ESX
Plugins: Aggregate 0.75
SYSTEM STATS: Time:12.5140 Method:spine Processes:2 Threads:15 Hosts:400 HostsPerProcess:200 DataSources:2909 RRDsProcessed:1384
I have just removed the scheduled task. Set the poller to use cmd.php. I then run spine from the CLI and get this
F:\spine>spine.exe
NOTE: The Shell Command Exists in the current directory
SPINE: Using spine config file [spine.conf]
SPINE: Version 0.8.7e starting
07/12/2010 03:59:33 PM - SPINE: Poller[0] ERROR: SQL Failed! Error:'1062', Messa
ge:'Duplicate entry '1189-traffic_in-2010-07-12 15:59:32' for key 1', SQL Fragme
nt:'INSERT INTO poller_output (local_data_id, rrd_name, time, output) VALUES (11
81,'dropped_in','2010-07-12 15:59:32','455'),(1186,'asa_memused','2010-07-12 15:
59:32','131327984'),(1189,'traffic_in','2010-07-12 15:59:32','183434401160'),(11
89,'traffic_out','2010-07-12 15:59:32','15699780153'),(1190,'traffic_in','2010-0
7-12 15:59:32','19815734412'),(1190,'traffic_out','2010-07-12 15:59:32','7772439
6931')'
SPINE: Time: 9.2530 s, Threads: 15, Hosts: 392
F:\spine>spine.exe
NOTE: The Shell Command Exists in the current directory
SPINE: Using spine config file [spine.conf]
SPINE: Version 0.8.7e starting
SPINE: Time: 8.6600 s, Threads: 15, Hosts: 392
F:\spine>spine.exe
NOTE: The Shell Command Exists in the current directory
SPINE: Using spine config file [spine.conf]
SPINE: Version 0.8.7e starting
07/12/2010 04:01:25 PM - SPINE: Poller[0] ERROR: SQL Failed! Error:'1062', Messa
ge:'Duplicate entry '2914-traffic_out-2010-07-12 16:01:25' for key 1', SQL Fragm
ent:'INSERT INTO poller_output (local_data_id, rrd_name, time, output) VALUES (2
914,'traffic_out','2010-07-12 16:01:25','119542947'),(2914,'traffic_in','2010-07
-12 16:01:25','456242016'),(2915,'traffic_out','2010-07-12 16:01:25','3403303128
'),(2915,'traffic_in','2010-07-12 16:01:25','2837387479')'
07/12/2010 04:01:26 PM - SPINE: Poller[0] ERROR: SQL Failed! Error:'1062', Messa
ge:'Duplicate entry '2920-traffic_in-2010-07-12 16:01:25' for key 1', SQL Fragme
nt:'INSERT INTO poller_output (local_data_id, rrd_name, time, output) VALUES (29
16,'5min_cpu','2010-07-12 16:01:25','2'),(2920,'traffic_in','2010-07-12 16:01:25
','2142092695'),(2920,'traffic_out','2010-07-12 16:01:25','4271247815'),(2921,'t
raffic_in','2010-07-12 16:01:25','3474926298'),(2921,'traffic_out','2010-07-12 1
6:01:25','4114390338')'
07/12/2010 04:01:26 PM - SPINE: Poller[0] ERROR: SQL Failed! Error:'1062', Messa
ge:'Duplicate entry '2927-traffic_in-2010-07-12 16:01:25' for key 1', SQL Fragme
nt:'INSERT INTO poller_output (local_data_id, rrd_name, time, output) VALUES (29
23,'5sec_cpu','2010-07-12 16:01:25','5'),(2927,'traffic_in','2010-07-12 16:01:25
','3215481537'),(2927,'traffic_out','2010-07-12 16:01:25','935466407'),(2928,'tr
affic_in','2010-07-12 16:01:25','4031542309'),(2928,'traffic_out','2010-07-12 16
:01:25','3152348008')'
07/12/2010 04:01:26 PM - SPINE: Poller[0] ERROR: SQL Failed! Error:'1062', Messa
ge:'Duplicate entry '2904-numnds-2010-07-12 16:01:25' for key 1', SQL Fragment:'
INSERT INTO poller_output (local_data_id, rrd_name, time, output) VALUES (2904,'
numnds','2010-07-12 16:01:25','210'),(2904,'sum2pds','2010-07-12 16:01:25','264'
),(2904,'sum2nds','2010-07-12 16:01:25','267'),(2904,'numpsd','2010-07-12 16:01:
25','215'),(2904,'sumpds','2010-07-12 16:01:25','228'),(2904,'sumnsd','2010-07-1
2 16:01:25','245'),(2904,'numpds','2010-07-12 16:01:25','210'),(2904,'sum2psd','
2010-07-12 16:01:25','302'),(2904,'numnsd','2010-07-12 16:01:25','225'),(2904,'s
um2nsd','2010-07-12 16:01:25','285'),(2904,'sumnds','2010-07-12 16:01:25','229')
,(2904,'sumpsd','2010-07-12 16:01:25','244'),(2905,'maxpds','2010-07-12 16:01:25
','2'),(2905,'maxnsd','2010-07-12 16:01:25','2'),(2905,'maxnds','2010-07-12 16:0
1:25','2'),(2905,'maxpsd','2010-07-12 16:01:25','2'),(2906,'Late','2010-07-12 16
:01:25','0'),(2906,'lossDS','2010-07-12 16:01:25','0'),(2906,'OOS','2010-07-12 1
6:01:25','0'),(2906,'MIA','2010-07-12 16:01:25','0'),(2906,'lossSD','2010-07-12
16:01:25','0'),(2906,'rtt','2010-07-12 16:01:25','8'),(2906,'rt'
SPINE: Time: 8.5870 s, Threads: 15, Hosts: 392
F:\spine>
If i then run poller.php i get this but i have to ctrl z to stop it otherwise it takes ages to complete if at all. I have seen it go to 8000 seconds
F:\spine>f:\php\php.exe f:\apache2\htdocs\cacti\poller.php
07/12/2010 04:03:01 PM - POLLER: Poller[0] WARNING: Scheduled Task is out of syn
c with the Poller Interval! The Poller Interval is '60' seconds, with a maximum
of a '300' second Scheduled Task, but 313 seconds have passed since the last po
ll!
^C
F:\spine>
If i run spine i see poller output go up but not clear down
mysql> select count(*) from poller_output;
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (0.00 sec)
mysql> select count(*) from poller_output;
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (0.00 sec)
mysql> select count(*) from poller_output;
+----------+
| count(*) |
+----------+
| 194 |
+----------+
1 row in set (0.00 sec)
If i then run poller.php it clears it
I'm really stumped and would appreciate some assistance.
Cheers
Jay
F:\spine>spine.exe
NOTE: The Shell Command Exists in the current directory
SPINE: Using spine config file [spine.conf]
SPINE: Version 0.8.7e starting
07/12/2010 03:59:33 PM - SPINE: Poller[0] ERROR: SQL Failed! Error:'1062', Messa
ge:'Duplicate entry '1189-traffic_in-2010-07-12 15:59:32' for key 1', SQL Fragme
nt:'INSERT INTO poller_output (local_data_id, rrd_name, time, output) VALUES (11
81,'dropped_in','2010-07-12 15:59:32','455'),(1186,'asa_memused','2010-07-12 15:
59:32','131327984'),(1189,'traffic_in','2010-07-12 15:59:32','183434401160'),(11
89,'traffic_out','2010-07-12 15:59:32','15699780153'),(1190,'traffic_in','2010-0
7-12 15:59:32','19815734412'),(1190,'traffic_out','2010-07-12 15:59:32','7772439
6931')'
SPINE: Time: 9.2530 s, Threads: 15, Hosts: 392
F:\spine>spine.exe
NOTE: The Shell Command Exists in the current directory
SPINE: Using spine config file [spine.conf]
SPINE: Version 0.8.7e starting
SPINE: Time: 8.6600 s, Threads: 15, Hosts: 392
F:\spine>spine.exe
NOTE: The Shell Command Exists in the current directory
SPINE: Using spine config file [spine.conf]
SPINE: Version 0.8.7e starting
07/12/2010 04:01:25 PM - SPINE: Poller[0] ERROR: SQL Failed! Error:'1062', Messa
ge:'Duplicate entry '2914-traffic_out-2010-07-12 16:01:25' for key 1', SQL Fragm
ent:'INSERT INTO poller_output (local_data_id, rrd_name, time, output) VALUES (2
914,'traffic_out','2010-07-12 16:01:25','119542947'),(2914,'traffic_in','2010-07
-12 16:01:25','456242016'),(2915,'traffic_out','2010-07-12 16:01:25','3403303128
'),(2915,'traffic_in','2010-07-12 16:01:25','2837387479')'
07/12/2010 04:01:26 PM - SPINE: Poller[0] ERROR: SQL Failed! Error:'1062', Messa
ge:'Duplicate entry '2920-traffic_in-2010-07-12 16:01:25' for key 1', SQL Fragme
nt:'INSERT INTO poller_output (local_data_id, rrd_name, time, output) VALUES (29
16,'5min_cpu','2010-07-12 16:01:25','2'),(2920,'traffic_in','2010-07-12 16:01:25
','2142092695'),(2920,'traffic_out','2010-07-12 16:01:25','4271247815'),(2921,'t
raffic_in','2010-07-12 16:01:25','3474926298'),(2921,'traffic_out','2010-07-12 1
6:01:25','4114390338')'
07/12/2010 04:01:26 PM - SPINE: Poller[0] ERROR: SQL Failed! Error:'1062', Messa
ge:'Duplicate entry '2927-traffic_in-2010-07-12 16:01:25' for key 1', SQL Fragme
nt:'INSERT INTO poller_output (local_data_id, rrd_name, time, output) VALUES (29
23,'5sec_cpu','2010-07-12 16:01:25','5'),(2927,'traffic_in','2010-07-12 16:01:25
','3215481537'),(2927,'traffic_out','2010-07-12 16:01:25','935466407'),(2928,'tr
affic_in','2010-07-12 16:01:25','4031542309'),(2928,'traffic_out','2010-07-12 16
:01:25','3152348008')'
07/12/2010 04:01:26 PM - SPINE: Poller[0] ERROR: SQL Failed! Error:'1062', Messa
ge:'Duplicate entry '2904-numnds-2010-07-12 16:01:25' for key 1', SQL Fragment:'
INSERT INTO poller_output (local_data_id, rrd_name, time, output) VALUES (2904,'
numnds','2010-07-12 16:01:25','210'),(2904,'sum2pds','2010-07-12 16:01:25','264'
),(2904,'sum2nds','2010-07-12 16:01:25','267'),(2904,'numpsd','2010-07-12 16:01:
25','215'),(2904,'sumpds','2010-07-12 16:01:25','228'),(2904,'sumnsd','2010-07-1
2 16:01:25','245'),(2904,'numpds','2010-07-12 16:01:25','210'),(2904,'sum2psd','
2010-07-12 16:01:25','302'),(2904,'numnsd','2010-07-12 16:01:25','225'),(2904,'s
um2nsd','2010-07-12 16:01:25','285'),(2904,'sumnds','2010-07-12 16:01:25','229')
,(2904,'sumpsd','2010-07-12 16:01:25','244'),(2905,'maxpds','2010-07-12 16:01:25
','2'),(2905,'maxnsd','2010-07-12 16:01:25','2'),(2905,'maxnds','2010-07-12 16:0
1:25','2'),(2905,'maxpsd','2010-07-12 16:01:25','2'),(2906,'Late','2010-07-12 16
:01:25','0'),(2906,'lossDS','2010-07-12 16:01:25','0'),(2906,'OOS','2010-07-12 1
6:01:25','0'),(2906,'MIA','2010-07-12 16:01:25','0'),(2906,'lossSD','2010-07-12
16:01:25','0'),(2906,'rtt','2010-07-12 16:01:25','8'),(2906,'rt'
SPINE: Time: 8.5870 s, Threads: 15, Hosts: 392
F:\spine>
If i then run poller.php i get this but i have to ctrl z to stop it otherwise it takes ages to complete if at all. I have seen it go to 8000 seconds
F:\spine>f:\php\php.exe f:\apache2\htdocs\cacti\poller.php
07/12/2010 04:03:01 PM - POLLER: Poller[0] WARNING: Scheduled Task is out of syn
c with the Poller Interval! The Poller Interval is '60' seconds, with a maximum
of a '300' second Scheduled Task, but 313 seconds have passed since the last po
ll!
^C
F:\spine>
If i run spine i see poller output go up but not clear down
mysql> select count(*) from poller_output;
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (0.00 sec)
mysql> select count(*) from poller_output;
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (0.00 sec)
mysql> select count(*) from poller_output;
+----------+
| count(*) |
+----------+
| 194 |
+----------+
1 row in set (0.00 sec)
If i then run poller.php it clears it
I'm really stumped and would appreciate some assistance.
Cheers
Jay
Cacti Version 0.8.7e, Spine 0.8.7e, Apache 2.2.15, Mysql 5.0.88, PHP 5.2.13, RRDTool 1.2.30, NET-SNMP 5.5
Quad Core AMD Opteron Processor 2384, 2.70Ghz, 2GB RAM , 1 CPU used
Windows Server 2003 (X64), VMWARE ESX
Plugins: Aggregate 0.75
SYSTEM STATS: Time:12.5140 Method:spine Processes:2 Threads:15 Hosts:400 HostsPerProcess:200 DataSources:2909 RRDsProcessed:1384
Quad Core AMD Opteron Processor 2384, 2.70Ghz, 2GB RAM , 1 CPU used
Windows Server 2003 (X64), VMWARE ESX
Plugins: Aggregate 0.75
SYSTEM STATS: Time:12.5140 Method:spine Processes:2 Threads:15 Hosts:400 HostsPerProcess:200 DataSources:2909 RRDsProcessed:1384
It looks purely to be the time it takes the poller to complet which is the issue.
F:\spine>f:\php\php.exe f:\apache2\htdocs\cacti\poller.php
07/12/2010 05:21:07 PM - POLLER: Poller[0] WARNING: Scheduled Task is out of syn
c with the Poller Interval! The Poller Interval is '60' seconds, with a maximum
of a '300' second Scheduled Task, but 314 seconds have passed since the last po
ll!
07/12/2010 05:24:51 PM - SYSTEM STATS: Time:223.5528 Method:spine Processes:1 Th
reads:15 Hosts:383 HostsPerProcess:383 DataSources:2800 RRDsProcessed:1315
So you can see from above even though it says 223.5528 seconds to complete when I run it again, pretty much straight away, it sawt 788 seconds have passed. So even though it give me the polling time above its a long time before i take me back to the command prompt.
F:\spine>f:\php\php.exe f:\apache2\htdocs\cacti\poller.php
07/12/2010 05:34:15 PM - POLLER: Poller[0] WARNING: Scheduled Task is out of syn
c with the Poller Interval! The Poller Interval is '60' seconds, with a maximum
of a '300' second Scheduled Task, but 788 seconds have passed since the last po
ll!
The above is all with the schedule task removed.
Cheers
F:\spine>f:\php\php.exe f:\apache2\htdocs\cacti\poller.php
07/12/2010 05:21:07 PM - POLLER: Poller[0] WARNING: Scheduled Task is out of syn
c with the Poller Interval! The Poller Interval is '60' seconds, with a maximum
of a '300' second Scheduled Task, but 314 seconds have passed since the last po
ll!
07/12/2010 05:24:51 PM - SYSTEM STATS: Time:223.5528 Method:spine Processes:1 Th
reads:15 Hosts:383 HostsPerProcess:383 DataSources:2800 RRDsProcessed:1315
So you can see from above even though it says 223.5528 seconds to complete when I run it again, pretty much straight away, it sawt 788 seconds have passed. So even though it give me the polling time above its a long time before i take me back to the command prompt.
F:\spine>f:\php\php.exe f:\apache2\htdocs\cacti\poller.php
07/12/2010 05:34:15 PM - POLLER: Poller[0] WARNING: Scheduled Task is out of syn
c with the Poller Interval! The Poller Interval is '60' seconds, with a maximum
of a '300' second Scheduled Task, but 788 seconds have passed since the last po
ll!
The above is all with the schedule task removed.
Cheers
Cacti Version 0.8.7e, Spine 0.8.7e, Apache 2.2.15, Mysql 5.0.88, PHP 5.2.13, RRDTool 1.2.30, NET-SNMP 5.5
Quad Core AMD Opteron Processor 2384, 2.70Ghz, 2GB RAM , 1 CPU used
Windows Server 2003 (X64), VMWARE ESX
Plugins: Aggregate 0.75
SYSTEM STATS: Time:12.5140 Method:spine Processes:2 Threads:15 Hosts:400 HostsPerProcess:200 DataSources:2909 RRDsProcessed:1384
Quad Core AMD Opteron Processor 2384, 2.70Ghz, 2GB RAM , 1 CPU used
Windows Server 2003 (X64), VMWARE ESX
Plugins: Aggregate 0.75
SYSTEM STATS: Time:12.5140 Method:spine Processes:2 Threads:15 Hosts:400 HostsPerProcess:200 DataSources:2909 RRDsProcessed:1384
MySQL error 1062 is duplicate entry.
Are there any hung php, perl, cmd, etc processes on your cacti server? Rebooted the server?
Are there any hung php, perl, cmd, etc processes on your cacti server? Rebooted the server?
| Scripts: Monitor processes | RFC1213 MIB | DOCSIS Stats | Dell PowerEdge | Speedfan | APC UPS | DOCSIS CMTS | 3ware | Motorola Canopy |
| Guides: Windows Install | [HOWTO] Debug Windows NTFS permission problems |
| Tools: Windows All-in-one Installer |
Hi
Thanks for responding.
The duplicate entry only appears now and again and probably when I had the scedule task running and was also running the poller from the command line so that might be a red herring.
No hung task from what i can see. I have rebooted several times. I have also been restarting mysql and also apache.
The server itself seems to be responding ok. I have been monitoring cpu, memory and couldn't see any obvious high usage. I have been using the sysinternals tools to do this.
Cheers
Jay
Thanks for responding.
The duplicate entry only appears now and again and probably when I had the scedule task running and was also running the poller from the command line so that might be a red herring.
No hung task from what i can see. I have rebooted several times. I have also been restarting mysql and also apache.
The server itself seems to be responding ok. I have been monitoring cpu, memory and couldn't see any obvious high usage. I have been using the sysinternals tools to do this.
Cheers
Jay
Cacti Version 0.8.7e, Spine 0.8.7e, Apache 2.2.15, Mysql 5.0.88, PHP 5.2.13, RRDTool 1.2.30, NET-SNMP 5.5
Quad Core AMD Opteron Processor 2384, 2.70Ghz, 2GB RAM , 1 CPU used
Windows Server 2003 (X64), VMWARE ESX
Plugins: Aggregate 0.75
SYSTEM STATS: Time:12.5140 Method:spine Processes:2 Threads:15 Hosts:400 HostsPerProcess:200 DataSources:2909 RRDsProcessed:1384
Quad Core AMD Opteron Processor 2384, 2.70Ghz, 2GB RAM , 1 CPU used
Windows Server 2003 (X64), VMWARE ESX
Plugins: Aggregate 0.75
SYSTEM STATS: Time:12.5140 Method:spine Processes:2 Threads:15 Hosts:400 HostsPerProcess:200 DataSources:2909 RRDsProcessed:1384
Ah, so your poller_output table issue isn't occurring on every polling cycle? If not, then it does sound like a script isn't timing out properly or something else is drastically taking longer than it should. I'd increase the cacti.log logging level in hopes of catching some sort of indication of what is going wrong.
| Scripts: Monitor processes | RFC1213 MIB | DOCSIS Stats | Dell PowerEdge | Speedfan | APC UPS | DOCSIS CMTS | 3ware | Motorola Canopy |
| Guides: Windows Install | [HOWTO] Debug Windows NTFS permission problems |
| Tools: Windows All-in-one Installer |
Before this happened I had been getting sub 20 second polling. My signature stats are a tiny bit out of date. Most of my templates were snmp based and its only once i had all those set up did i start to add the php script based ones to my server. I had then been adding devices to to these templates slowly so I could keep an eye on polling times. At this moment in time any devices that use the possibly slow scripts have been disabled.
Cacti Version 0.8.7e, Spine 0.8.7e, Apache 2.2.15, Mysql 5.0.88, PHP 5.2.13, RRDTool 1.2.30, NET-SNMP 5.5
Quad Core AMD Opteron Processor 2384, 2.70Ghz, 2GB RAM , 1 CPU used
Windows Server 2003 (X64), VMWARE ESX
Plugins: Aggregate 0.75
SYSTEM STATS: Time:12.5140 Method:spine Processes:2 Threads:15 Hosts:400 HostsPerProcess:200 DataSources:2909 RRDsProcessed:1384
Quad Core AMD Opteron Processor 2384, 2.70Ghz, 2GB RAM , 1 CPU used
Windows Server 2003 (X64), VMWARE ESX
Plugins: Aggregate 0.75
SYSTEM STATS: Time:12.5140 Method:spine Processes:2 Threads:15 Hosts:400 HostsPerProcess:200 DataSources:2909 RRDsProcessed:1384
I have probably posted a bit too much information which may be causing some confusion.
right now the schedule task has been removed
I have been running poller.php from the cli
When I run it i get the "Poller[0] WARNING: Scheduled Task is out of syn
c with the Poller Interval!" error messages
It then takes ages to complete the cycle.
When I then run it again I get exactly the same thing.
Now if I ctrl Z the poller from the cli and then run the command again it doesn't output anything and just goes back to the cli. But in the background the task is still completing and its not until its done that I can try again and get the out of sync error again.
To me what It looks like is the poller is taking an age to complete hence the error messages. If the scheduled task is set up I see the same behaviour where it will only log the out of sync message once the task has completed, but this has been taking hundreds and more of seconds to complete.
Oh and I do see the WARNING: Poller Output Table not Empty quite regulary, but haven't seen it on the last 2 polls I have done.
Regarding scripts I disabled the devices that use them. There may be just the Cisco memory one that is used currently. But that seems like a fast script and shouldn't cause any issues, unlike the NBAR and CBWFQ ones which i have disabled.
I hope that makes it clearer.
Cheers
Jay
right now the schedule task has been removed
I have been running poller.php from the cli
When I run it i get the "Poller[0] WARNING: Scheduled Task is out of syn
c with the Poller Interval!" error messages
It then takes ages to complete the cycle.
When I then run it again I get exactly the same thing.
Now if I ctrl Z the poller from the cli and then run the command again it doesn't output anything and just goes back to the cli. But in the background the task is still completing and its not until its done that I can try again and get the out of sync error again.
To me what It looks like is the poller is taking an age to complete hence the error messages. If the scheduled task is set up I see the same behaviour where it will only log the out of sync message once the task has completed, but this has been taking hundreds and more of seconds to complete.
Oh and I do see the WARNING: Poller Output Table not Empty quite regulary, but haven't seen it on the last 2 polls I have done.
Regarding scripts I disabled the devices that use them. There may be just the Cisco memory one that is used currently. But that seems like a fast script and shouldn't cause any issues, unlike the NBAR and CBWFQ ones which i have disabled.
I hope that makes it clearer.
Cheers
Jay
Cacti Version 0.8.7e, Spine 0.8.7e, Apache 2.2.15, Mysql 5.0.88, PHP 5.2.13, RRDTool 1.2.30, NET-SNMP 5.5
Quad Core AMD Opteron Processor 2384, 2.70Ghz, 2GB RAM , 1 CPU used
Windows Server 2003 (X64), VMWARE ESX
Plugins: Aggregate 0.75
SYSTEM STATS: Time:12.5140 Method:spine Processes:2 Threads:15 Hosts:400 HostsPerProcess:200 DataSources:2909 RRDsProcessed:1384
Quad Core AMD Opteron Processor 2384, 2.70Ghz, 2GB RAM , 1 CPU used
Windows Server 2003 (X64), VMWARE ESX
Plugins: Aggregate 0.75
SYSTEM STATS: Time:12.5140 Method:spine Processes:2 Threads:15 Hosts:400 HostsPerProcess:200 DataSources:2909 RRDsProcessed:1384
Hi
I have run a debug and ran poller.php. Here is the output. You can see instantly that I get a out of sync message. It then takes 51 seconds for Spine to complete and then a few minutes for the RRD updates to finish.
See attached.
I have run a debug and ran poller.php. Here is the output. You can see instantly that I get a out of sync message. It then takes 51 seconds for Spine to complete and then a few minutes for the RRD updates to finish.
See attached.
- Attachments
-
- Cacti_debug.txt
- (924.13 KiB) Downloaded 144 times
Cacti Version 0.8.7e, Spine 0.8.7e, Apache 2.2.15, Mysql 5.0.88, PHP 5.2.13, RRDTool 1.2.30, NET-SNMP 5.5
Quad Core AMD Opteron Processor 2384, 2.70Ghz, 2GB RAM , 1 CPU used
Windows Server 2003 (X64), VMWARE ESX
Plugins: Aggregate 0.75
SYSTEM STATS: Time:12.5140 Method:spine Processes:2 Threads:15 Hosts:400 HostsPerProcess:200 DataSources:2909 RRDsProcessed:1384
Quad Core AMD Opteron Processor 2384, 2.70Ghz, 2GB RAM , 1 CPU used
Windows Server 2003 (X64), VMWARE ESX
Plugins: Aggregate 0.75
SYSTEM STATS: Time:12.5140 Method:spine Processes:2 Threads:15 Hosts:400 HostsPerProcess:200 DataSources:2909 RRDsProcessed:1384
Hi
I think I may have cracked it. Early days though. I disabled all but 3 devices and manually ran the poller and I was able to get it run without any errors. I ran this a few times and then set up my scheduled task and rebooted and it seems to be polling ok now. I'm now going to start enabling small groups of devices and keep an eye on polling times.
3 device poll
07/13/2010 12:31:01 PM - SYSTEM STATS: Time:1.1777 Method:spine Processes:1 Threads:15 Hosts:4 HostsPerProcess:4 DataSources:2 RRDsProcessed:2
More devices added
07/13/2010 12:38:02 PM - SYSTEM STATS: Time:1.9336 Method:spine Processes:1 Threads:15 Hosts:16 HostsPerProcess:16 DataSources:74 RRDsProcessed:38
I will update with progress but so far so good. Fingers crossed. I still don't know why it failed in the first place.
Cheers
Jay
I think I may have cracked it. Early days though. I disabled all but 3 devices and manually ran the poller and I was able to get it run without any errors. I ran this a few times and then set up my scheduled task and rebooted and it seems to be polling ok now. I'm now going to start enabling small groups of devices and keep an eye on polling times.
3 device poll
07/13/2010 12:31:01 PM - SYSTEM STATS: Time:1.1777 Method:spine Processes:1 Threads:15 Hosts:4 HostsPerProcess:4 DataSources:2 RRDsProcessed:2
More devices added
07/13/2010 12:38:02 PM - SYSTEM STATS: Time:1.9336 Method:spine Processes:1 Threads:15 Hosts:16 HostsPerProcess:16 DataSources:74 RRDsProcessed:38
I will update with progress but so far so good. Fingers crossed. I still don't know why it failed in the first place.
Cheers
Jay
Cacti Version 0.8.7e, Spine 0.8.7e, Apache 2.2.15, Mysql 5.0.88, PHP 5.2.13, RRDTool 1.2.30, NET-SNMP 5.5
Quad Core AMD Opteron Processor 2384, 2.70Ghz, 2GB RAM , 1 CPU used
Windows Server 2003 (X64), VMWARE ESX
Plugins: Aggregate 0.75
SYSTEM STATS: Time:12.5140 Method:spine Processes:2 Threads:15 Hosts:400 HostsPerProcess:200 DataSources:2909 RRDsProcessed:1384
Quad Core AMD Opteron Processor 2384, 2.70Ghz, 2GB RAM , 1 CPU used
Windows Server 2003 (X64), VMWARE ESX
Plugins: Aggregate 0.75
SYSTEM STATS: Time:12.5140 Method:spine Processes:2 Threads:15 Hosts:400 HostsPerProcess:200 DataSources:2909 RRDsProcessed:1384
I think the problem was related to our AV. I started enabling more hosts though out today and the polling times went above what I was used too seeing and this was with way less than half the devices added. I looked through the windows logs and noticed there were some AV updates rolled out about the same time Cacti went down. I asked them to disable AV for the Cacti directory and the polling times dropped as shown
After
07/13/2010 04:46:10 PM - SYSTEM STATS: Time:8.8631 Method:spine Processes:2 Threads:15 Hosts:127 HostsPerProcess:64 DataSources:751 RRDsProcessed:397
Before
07/13/2010 04:45:31 PM - SYSTEM STATS: Time:29.5730 Method:spine Processes:2 Threads:15 Hosts:127 HostsPerProcess:64 DataSources:751 RRDsProcessed:399
I'm hoping I can now continue adding the rest of the hosts without seeing any more issues.
Hoping this might help others who have had similar issues.
Cheers
Jay
After
07/13/2010 04:46:10 PM - SYSTEM STATS: Time:8.8631 Method:spine Processes:2 Threads:15 Hosts:127 HostsPerProcess:64 DataSources:751 RRDsProcessed:397
Before
07/13/2010 04:45:31 PM - SYSTEM STATS: Time:29.5730 Method:spine Processes:2 Threads:15 Hosts:127 HostsPerProcess:64 DataSources:751 RRDsProcessed:399
I'm hoping I can now continue adding the rest of the hosts without seeing any more issues.
Hoping this might help others who have had similar issues.
Cheers
Jay
Cacti Version 0.8.7e, Spine 0.8.7e, Apache 2.2.15, Mysql 5.0.88, PHP 5.2.13, RRDTool 1.2.30, NET-SNMP 5.5
Quad Core AMD Opteron Processor 2384, 2.70Ghz, 2GB RAM , 1 CPU used
Windows Server 2003 (X64), VMWARE ESX
Plugins: Aggregate 0.75
SYSTEM STATS: Time:12.5140 Method:spine Processes:2 Threads:15 Hosts:400 HostsPerProcess:200 DataSources:2909 RRDsProcessed:1384
Quad Core AMD Opteron Processor 2384, 2.70Ghz, 2GB RAM , 1 CPU used
Windows Server 2003 (X64), VMWARE ESX
Plugins: Aggregate 0.75
SYSTEM STATS: Time:12.5140 Method:spine Processes:2 Threads:15 Hosts:400 HostsPerProcess:200 DataSources:2909 RRDsProcessed:1384
Yikes, that's quite an impact. Hopefully it solves your issue.
| Scripts: Monitor processes | RFC1213 MIB | DOCSIS Stats | Dell PowerEdge | Speedfan | APC UPS | DOCSIS CMTS | 3ware | Motorola Canopy |
| Guides: Windows Install | [HOWTO] Debug Windows NTFS permission problems |
| Tools: Windows All-in-one Installer |
Who is online
Users browsing this forum: No registered users and 0 guests