Cacti script not responding.. cron out of sync with poller

Post general support questions here that do not specifically fall into the Linux or Windows categories.

Moderators: Developers, Moderators

switched
Posts: 29
Joined: Fri Nov 21, 2008 4:15 pm

Cacti script not responding.. cron out of sync with poller

Post by switched »

All,
I have an issue with my cacti install (0.8.7c).
I have only 25 hosts, with about 500RRDs both on local subnets and on the internet.

When our corporate internet feed goes down, I get no graphing on any hosts during that period. I even get no data appearing on the localhost graph so it indicates that an issue with the poller.

Currently my setup is as follows:
Poller = SPINE
Poller Interval = 1 min
Cron interval = 1 min
My Crotab =
*/1 * * * * php /var/www/html/poller.php > /dev/null 2>&1
0 1 * * * nice -n 15 /var/www/backup.sh

Attached is the log file. At approx 11:04 the corporate internet feed went down, and then the errors come through.

Can anyone tell me how I can address this issue?

I read that maybe I should be setting crontab to 5min intervals and setting cacti to the same? If I make this change will it cause any isssues to my existing data or stop producing data?

Code: Select all

04/19/2010 11:03:10 PM - SYSTEM STATS: Time:8.9122 Method:spine Processes:1 Threads:3 Hosts:25 HostsPerProcess:25 DataSources:835 RRDsProcessed:503
04/19/2010 11:04:11 PM - SYSTEM STATS: Time:8.9058 Method:spine Processes:1 Threads:3 Hosts:25 HostsPerProcess:25 DataSources:836 RRDsProcessed:504
04/19/2010 11:05:45 PM - POLLER: Poller[0] WARNING: Cron is out of sync with the Poller Interval!  The Poller Interval is '60' seconds, with a maximum of a '300' second Cron, but 103 seconds have passed since the last poll!
04/19/2010 11:06:42 PM - SPINE: Poller[0] WARNING: SS[0] The PHP Script Server did not respond in time and will therefore be restarted
04/19/2010 11:06:44 PM - POLLER: Poller[0] Maximum runtime of 58 seconds exceeded. Exiting.
04/19/2010 11:06:44 PM - SYSTEM STATS: Time:58.3840 Method:spine Processes:1 Threads:3 Hosts:25 HostsPerProcess:25 DataSources:835 RRDsProcessed:0
04/19/2010 11:06:52 PM - SPINE: Poller[0] WARNING: SS[0] The PHP Script Server did not respond in time and will therefore be restarted
04/19/2010 11:07:03 PM - SPINE: Poller[0] WARNING: SS[0] The PHP Script Server did not respond in time and will therefore be restarted
04/19/2010 11:07:13 PM - SPINE: Poller[0] WARNING: SS[0] The PHP Script Server did not respond in time and will therefore be restarted
04/19/2010 11:07:23 PM - SPINE: Poller[0] WARNING: SS[0] The PHP Script Server did not respond in time and will therefore be restarted
04/19/2010 11:07:33 PM - SPINE: Poller[0] WARNING: SS[0] The PHP Script Server did not respond in time and will therefore be restarted
04/19/2010 11:07:42 PM - SPINE: Poller[0] WARNING: SS[0] The PHP Script Server did not respond in time and will therefore be restarted
04/19/2010 11:07:43 PM - SPINE: Poller[0] WARNING: SS[0] The PHP Script Server did not respond in time and will therefore be restarted
04/19/2010 11:07:44 PM - POLLER: Poller[0] Maximum runtime of 58 seconds exceeded. Exiting.
04/19/2010 11:07:44 PM - SYSTEM STATS: Time:58.3997 Method:spine Processes:1 Threads:3 Hosts:25 HostsPerProcess:25 DataSources:835 RRDsProcessed:0
04/19/2010 11:07:52 PM - SPINE: Poller[0] WARNING: SS[0] The PHP Script Server did not respond in time and will therefore be restarted
04/19/2010 11:07:53 PM - SPINE: Poller[0] WARNING: SS[0] The PHP Script Server did not respond in time and will therefore be restarted
04/19/2010 11:08:03 PM - SPINE: Poller[0] WARNING: SS[0] The PHP Script Server did not respond in time and will therefore be restarted
04/19/2010 11:08:03 PM - SPINE: Poller[0] WARNING: SS[0] The PHP Script Server did not respond in time and will therefore be restarted
04/19/2010 11:08:13 PM - SPINE: Poller[0] WARNING: SS[0] The PHP Script Server did not respond in time and will therefore be restarted
04/19/2010 11:08:13 PM - SPINE: Poller[0] WARNING: SS[0] The PHP Script Server did not respond in time and will therefore be restarted
04/19/2010 11:08:14 PM - SPINE: Poller[0] ERROR: SS[0] Script Server did not start properly return message was: 'U'
04/19/2010 11:08:14 PM - SPINE: Poller[0] ERROR: SS[0] Script Server did not start properly return message was: 'U'
04/19/2010 11:08:14 PM - SPINE: Poller[0] ERROR: SS[0] Script Server did not start properly return message was: 'U'
04/19/2010 11:08:14 PM - SPINE: Poller[0] ERROR: SS[999] Script Server did not start properly return message was: 'U'
04/19/2010 11:08:14 PM - SPINE: Poller[0] ERROR: SS[0] Script Server did not start properly return message was: 'U'
04/19/2010 11:08:14 PM - SPINE: Poller[0] ERROR: SS[0] Script Server did not start properly return message was: 'U'
04/19/2010 11:08:14 PM - SPINE: Poller[0] ERROR: SS[0] Script Server did not start properly return message was: 'U'
04/19/2010 11:08:14 PM - SPINE: Poller[0] ERROR: SS[0] Script Server did not start properly return message was: 'U'
04/19/2010 11:08:14 PM - SPINE: Poller[0] ERROR: SS[0] Script Server did not start properly return message was: 'U'
04/19/2010 11:08:14 PM - SPINE: Poller[0] ERROR: SS[0] Script Server did not start properly return message was: 'U'
04/19/2010 11:08:14 PM - SPINE: Poller[0] ERROR: SS[0] Script Server did not start properly return message was: 'U'
04/19/2010 11:08:14 PM - SPINE: Poller[0] ERROR: SS[0] Script Server did not start properly return message was: 'U'
04/19/2010 11:08:14 PM - SPINE: Poller[0] ERROR: SS[0] Script Server did not start properly return message was: 'U'
04/19/2010 11:08:14 PM - SPINE: Poller[0] ERROR: SS[999] Script Server did not start properly return message was: 'U'
04/19/2010 11:08:22 PM - SPINE: Poller[0] ERROR: Spine Timed Out While Processing Hosts Internal
04/19/2010 11:08:22 PM - SPINE: Poller[0] ERROR: Spine Timed Out While Processing Hosts Internal
04/19/2010 11:08:22 PM - SPINE: Poller[0] ERROR: SQL Failed! Error:'1062', Message:'Duplicate entry '67-cisco_memfree-2010-04-19 23:08:22' for key 1', SQL Fragment:'INSERT INTO poller_output (local_data_id, rrd_name, time, output) VALUES (67,'cisco_memfree','2010-04-19 23:08:22','195707452'),(68,'cisco_memused','2010-04-19 23:08:22','19748068'),(69,'cisco_tempcurInlet','2010-04-19 23:08:22','18'),(70,'cisco_tempcurOutlet','2010-04-19 23:08:22','22'),(71,'cisco_tempthr','2010-04-19 23:08:22','50'),(72,'1min_cpu','2010-04-19 23:08:22','0'),(73,'5min_cpu','2010-04-19 23:08:22','0'),(74,'5sec_cpu','2010-04-19 23:08:22','0'),(75,'cisco_memfree','2010-04-19 23:08:22','195668836'),(76,'cisco_memused','2010-04-19 23:08:22','19748832'),(77,'Syd_Slot_0_CPU','2010-04-19 23:08:22','0'),(78,'Syd_Slot_1_CPU','2010-04-19 23:08:22','0'),(79,'Syd_Slot_4_CPU','2010-04-19 23:08:22','0'),(80,'Syd_Slot_5_CPU','2010-04-19 23:08:22','0'),(81,'Syd_Slot_6_CPU','2010-04-19 23:08:22','0'),(82,'traffic_out','2010-04-19 23:08:22','183876488'),(82,'traffic_in','2010-04-19 23:08:22','0'),(83,'traffic_out','2010-04-19 23:08:22','1737182558'),(83,'traffic_in','2010-04-19 23:08:22','3267660'),(84,'traff'
04/19/2010 11:08:22 PM - SPINE: Poller[0] ERROR: SQL Failed! Error:'1062', Message:'Duplicate entry '26-cisco_memfree-2010-04-19 23:08:22' for key 1', SQL Fragment:'INSERT INTO poller_output (local_data_id, rrd_name, time, output) VALUES (26,'cisco_memfree','2010-04-19 23:08:22','15931512'),(27,'cisco_memused','2010-04-19 23:08:22','34400136'),(28,'cisco_tempcurInlet','2010-04-19 23:08:22','21'),(29,'cisco_tempcurOutlet','2010-04-19 23:08:22','26'),(30,'cisco_tempthr','2010-04-19 23:08:22','50'),(31,'1min_cpu','2010-04-19 23:08:22','5'),(32,'5min_cpu','2010-04-19 23:08:22','6'),(33,'5sec_cpu','2010-04-19 23:08:22','5'),(34,'cisco_memfree','2010-04-19 23:08:22','150935584'),(35,'cisco_memused','2010-04-19 23:08:22','34400916'),(36,'Ade_Slot_0_CPU','2010-04-19 23:08:22','U'),(37,'Ade_Slot_1_CPU','2010-04-19 23:08:22','U'),(38,'Ade_Slot_4_CPU','2010-04-19 23:08:22','U'),(39,'Ade_Slot_5_CPU','2010-04-19 23:08:22','U'),(176,'traffic_in','2010-04-19 23:08:22','1690249939'),(176,'traffic_out','2010-04-19 23:08:22','4093524628'),(177,'traffic_out','2010-04-19 23:08:22','2876962174'),(177,'traffic_in','2010-04-19 23:08:22','3889790736'),(178,'traffic_out','2010-04-19 23:08:22',''
04/19/2010 11:08:23 PM - SPINE: Poller[0] ERROR: SS[0] PHP Script Server communications lost.  Restarting PHP Script Server
04/19/2010 11:08:23 PM - SYSTEM STATS: Time:34.7722 Method:spine Processes:1 Threads:3 Hosts:25 HostsPerProcess:25 DataSources:835 RRDsProcessed:39
04/19/2010 11:08:23 PM - PHPSVR: Poller[0] ERROR: Input Expected, Script Server Terminating
04/19/2010 11:09:02 PM - POLLER: Poller[0] WARNING: Cron is out of sync with the Poller Interval!  The Poller Interval is '60' seconds, with a maximum of a '300' second Cron, but 74 seconds have passed since the last poll!
04/19/2010 11:09:02 PM - POLLER: Poller[0] WARNING: Poller Output Table not Empty.  Issues Found: 777, Data Sources: mem_buffers(DS[19]), mem_swap(DS[20]), (DS[21]), (DS[22]), (DS[23]), proc(DS[25]), cisco_memfree(DS[26]), cisco_memused(DS[27]), cisco_tempcurInlet(DS[28]), cisco_tempcurOutlet(DS[29]), cisco_tempthr(DS[30]), 1min_cpu(DS[31]), 5min_cpu(DS[32]), 5sec_cpu(DS[33]), cisco_memfree(DS[34]), cisco_memused(DS[35]), Ade_Slot_0_CPU(DS[36]), Ade_Slot_1_CPU(DS[37]), Ade_Slot_4_CPU(DS[38]), Ade_Slot_5_CPU(DS[39]), cisco_memfree(DS[67]), Additional Issues Remain.  Only showing first 20
04/19/2010 11:09:11 PM - SYSTEM STATS: Time:9.5580 Method:spine Processes:1 Threads:3 Hosts:25 HostsPerProcess:25 DataSources:836 RRDsProcessed:504
04/19/2010 11:10:10 PM - SYSTEM STATS: Time:9.0814 Method:spine Processes:1 Threads:3 Hosts:25 HostsPerProcess:25 DataSources:836 RRDsProcessed:504
[/code]
User avatar
Linegod
Developer
Posts: 1626
Joined: Thu Feb 20, 2003 10:16 am
Location: Canada
Contact:

Post by Linegod »

You've already picked your ditch with 1 minute polling (which personally, I despise), so changing that isn't going to help.

Are you using IPs or FQDN for hostnames? Are you using an external DNS?
--
Live fast, die young
You're sucking up my bandwidth.

J.P. Pasnak,CD
CCNA, LPIC-1
http://www.warpedsystems.sk.ca
switched
Posts: 29
Joined: Fri Nov 21, 2008 4:15 pm

Post by switched »

All devices are by IP address only (no domain names).
User avatar
Linegod
Developer
Posts: 1626
Joined: Thu Feb 20, 2003 10:16 am
Location: Canada
Contact:

Post by Linegod »

What OS? (Post the output of 'System Utilities/Technical Support')

What about your MySQL server? Is it addressed by IP?
--
Live fast, die young
You're sucking up my bandwidth.

J.P. Pasnak,CD
CCNA, LPIC-1
http://www.warpedsystems.sk.ca
switched
Posts: 29
Joined: Fri Nov 21, 2008 4:15 pm

Post by switched »

Linegod wrote:What OS? (Post the output of 'System Utilities/Technical Support')

What about your MySQL server? Is it addressed by IP?
HI Linegod, see below. We use CactiEZ to load onto our server.

Below that is the MYSQL file from /etc/my.cnf

Code: Select all

 	Wed, 21 Apr 2010 04:55:55 +0000
Cacti Version 	0.8.7c
Cacti OS 	unix
SNMP Version 	net-snmp
RRDTool Version 	RRDTool 1.2.x
Hosts 	24
Graphs 	408
Data Sources 	Script/Command: 6
SNMP: 166
SNMP Query: 336
Script - Script Server (PHP): 1
Total: 509
Poller Information
Interval 	60
Type 	spine
Items 	Action[0]: 830
Action[1]: 6
Action[2]: 1
Total: 837
Concurrent Processes 	1
Max Threads 	3
PHP Servers 	10
Script Timeout 	10
Max OID 	10
Last Run Statistics 	Time:8.8731 Method:spine Processes:1 Threads:3 Hosts:25 HostsPerProcess:25 DataSources:835 RRDsProcessed:503
PHP Information
PHP Version 	5.1.6
PHP OS 	Linux
PHP uname 	Linux localhost.localdomain 2.6.9-78.0.17.plus.c4 #1 Mon Apr 20 14:21:42 EDT 2009 i686
PHP SNMP 	Installed
max_execution_time 	30
memory_limit 	32M
Also here is mysql file

Code: Select all

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
user=mysql
# Default to using old password format for compatibility with mysql 3.x
# clients (those using the mysqlclient10 compatibility package).
old_passwords=1

# To allow mysqld to connect to a MySQL Cluster management daemon, uncomment
# these lines and adjust the connectstring as needed.
#ndbcluster
#ndb-connectstring="nodeid=4;host=localhost:1186"

[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

[ndbd]
# If you are running a MySQL Cluster storage daemon (ndbd) on this machine,
# adjust its connection to the management daemon here.
# Note: ndbd init script requires this to include nodeid!
connect-string="nodeid=2;host=localhost:1186"

[ndb_mgm]
# connection string for MySQL Cluster management tool
connect-string="host=localhost:1186"
[/code]
User avatar
Linegod
Developer
Posts: 1626
Joined: Thu Feb 20, 2003 10:16 am
Location: Canada
Contact:

Post by Linegod »

And the devices you are monitoring are not on the otherside of the corporate Internet access, nor do you use that access or device for routing?
--
Live fast, die young
You're sucking up my bandwidth.

J.P. Pasnak,CD
CCNA, LPIC-1
http://www.warpedsystems.sk.ca
switched
Posts: 29
Joined: Fri Nov 21, 2008 4:15 pm

Post by switched »

Many of the devices are on our private corporate network, but there are 5 or 6 that are monitored off their public IP addresses which traverses the corporate internet feed. When our internet connection goes down (ie our corporate internet feed), this is when we have ALL graphs going blank, and these errors among others coming up in the log

Code: Select all

04/19/2010 11:08:13 PM - SPINE: Poller[0] WARNING: SS[0] The PHP Script Server did not respond in time and will therefore be restarted 
All graphs will display blanks during this period including the Localhost.

I have increased the CRON to 5 minutes and adjusted "Cron Interval" to 5 under System -> Poller.
User avatar
Linegod
Developer
Posts: 1626
Joined: Thu Feb 20, 2003 10:16 am
Location: Canada
Contact:

Post by Linegod »

switched wrote: I have increased the CRON to 5 minutes and adjusted "Cron Interval" to 5 under System -> Poller.
That is just going to mess things up. If you configured 1 minute polling correctly, your heartbeat is going to be out of sync.

Odds are it just can't complete in 60 seconds. You might have better luck increasing the processes (twice the number of CPUs in the box).

If you are going to change to 5 minute polling, either recreate your graphs or spend some time reading up on rrdtune.
--
Live fast, die young
You're sucking up my bandwidth.

J.P. Pasnak,CD
CCNA, LPIC-1
http://www.warpedsystems.sk.ca
switched
Posts: 29
Joined: Fri Nov 21, 2008 4:15 pm

Post by switched »

As I said it was setup using CactiEZ so I didnt have to configure much at all.

Attached is an image of my Poller settings.

What would be a good number to set the Threads Per Process to?


Image
switched
Posts: 29
Joined: Fri Nov 21, 2008 4:15 pm

Post by switched »

Ok, A day or two ago, I was informed we had the corporate internet go down again. ALL graphing stops, even on the localhost.
Here is the log file:

Code: Select all

04/21/2010 03:31:10 PM - SYSTEM STATS: Time:8.2975 Method:spine Processes:1 Threads:6 Hosts:25 HostsPerProcess:25 DataSources:835 RR
DsProcessed:503
04/21/2010 03:32:59 PM - SPINE: Poller[0] WARNING: SS[0] The PHP Script Server did not respond in time and will therefore be restart
ed
04/21/2010 03:33:09 PM - SPINE: Poller[0] WARNING: SS[0] The PHP Script Server did not respond in time and will therefore be restart
ed
04/21/2010 03:33:19 PM - SPINE: Poller[0] WARNING: SS[0] The PHP Script Server did not respond in time and will therefore be restart
ed
04/21/2010 03:33:19 PM - SPINE: Poller[0] ERROR: SS[0] Script Server did not start properly return message was: 'U'
04/21/2010 03:33:19 PM - SPINE: Poller[0] ERROR: SS[0] Script Server did not start properly return message was: 'U'
04/21/2010 03:33:19 PM - SPINE: Poller[0] ERROR: SS[999] Script Server did not start properly return message was: 'U'
04/21/2010 03:33:24 PM - SPINE: Poller[0] ERROR: Spine Timed Out While Processing Hosts Internal
04/21/2010 03:33:24 PM - SPINE: Poller[0] ERROR: SS[0] PHP Script Server communications lost.  Restarting PHP Script Server
04/21/2010 03:33:24 PM - SYSTEM STATS: Time:82.6849 Method:spine Processes:1 Threads:6 Hosts:25 HostsPerProcess:25 DataSources:835 R
RDsProcessed:97
04/21/2010 03:33:25 PM - PHPSVR: Poller[0] ERROR: Input Expected, Script Server Terminating
04/21/2010 03:33:32 PM - SYSTEM STATS: Time:7.6216 Method:spine Processes:1 Threads:6 Hosts:25 HostsPerProcess:25 DataSources:835 RR
DsProcessed:504
04/21/2010 03:34:32 PM - SYSTEM STATS: Time:7.8798 Method:spine Processes:1 Threads:6 Hosts:25 HostsPerProcess:25 DataSources:835 RR
User avatar
Linegod
Developer
Posts: 1626
Joined: Thu Feb 20, 2003 10:16 am
Location: Canada
Contact:

Post by Linegod »

OK, what PHP scripts are you using, and what are they doing?
--
Live fast, die young
You're sucking up my bandwidth.

J.P. Pasnak,CD
CCNA, LPIC-1
http://www.warpedsystems.sk.ca
switched
Posts: 29
Joined: Fri Nov 21, 2008 4:15 pm

Post by switched »

Looking in the Data Input Methods I have this:


Image




The Local Poller statistics was a script I found online and just added it. It displays the poller runtime, processes, threads. hosts, hosts per process, datasources and rrds. I am happy to delete it as I dont use it anyhow. Would deleting it help with my issue?




Image[/img]
User avatar
Linegod
Developer
Posts: 1626
Joined: Thu Feb 20, 2003 10:16 am
Location: Canada
Contact:

Post by Linegod »

It shouldn't cause any issue, as it just talks to the database - but it shouldn't time out either, unless your MySQL server is being addressed by a FQDN, and not a domain name.

Try removing it, and I'll take a look at it - maybe it's doing something it shouldn't....
--
Live fast, die young
You're sucking up my bandwidth.

J.P. Pasnak,CD
CCNA, LPIC-1
http://www.warpedsystems.sk.ca
switched
Posts: 29
Joined: Fri Nov 21, 2008 4:15 pm

Post by switched »

Linegod wrote:It shouldn't cause any issue, as it just talks to the database - but it shouldn't time out either, unless your MySQL server is being addressed by a FQDN, and not a domain name.

Try removing it, and I'll take a look at it - maybe it's doing something it shouldn't....
Ok, ive removed that script (not deleted it though - it still exists under data input methods). I think I have uploaded what you need here:

http://www.mediafire.com/?gmmmnmui3oc

How do I address the MySQL server being address by a FQDN? Can I upload a certain file to show you the contents?
User avatar
Linegod
Developer
Posts: 1626
Joined: Thu Feb 20, 2003 10:16 am
Location: Canada
Contact:

Post by Linegod »

Well, are you calling your database by a FQDN in include/config.php?
--
Live fast, die young
You're sucking up my bandwidth.

J.P. Pasnak,CD
CCNA, LPIC-1
http://www.warpedsystems.sk.ca
Post Reply

Who is online

Users browsing this forum: No registered users and 0 guests