SPINE: Poller[0] ERROR: Spine Timed Out While Processing Hos

Post general support questions here that do not specifically fall into the Linux or Windows categories.

Moderators: Developers, Moderators

kingaru1
Posts: 7
Joined: Thu Feb 02, 2012 8:38 am

SPINE: Poller[0] ERROR: Spine Timed Out While Processing Hos

Post by kingaru1 »

Hello,

Well ... running latest version Cacti + latest spine.
having great issues with graphs and rrd not being created for required hosts and or graphs being broken.
It looks like the issue arises when I add more than certain number of items for monitoring.
Cron set for 1 min. Cron interval and Grpah intervals set for 1 min as well.
Keep getting constantly "SPINE: Poller[0] ERROR: Spine Timed Out While Processing Hosts Internal " in the log :(
Can somebody please help with this?

Thanks
K.
kingaru1
Posts: 7
Joined: Thu Feb 02, 2012 8:38 am

Re: SPINE: Poller[0] ERROR: Spine Timed Out While Processing

Post by kingaru1 »

C'mon guys ... I don't believe that I'm so unique in the issue ...
Why collects data and does not draw graph or stops creating rrd files at the certain poin!?
Also I dont think this is the issue comes from Spine ... switching to cmd.php does not fix the problem but just transforms error in to a warning "POLLER: Poller[0] WARNING: Poller Output Table not Empty. Issues Found: 8, Data Sources: blah blah blah "
Could you please at least let me know at least where to dig in? I searched through the posts regarding similar issue but all issues/fixes going back to 2005-2006 ... several releases back ... where all issues were caused by some sort bugs and should be fixed unfortunately it looks like not all of them fixed therefore I need help to help you and myself to fix it ...

Regards
K.

P.S. Just come up in the log " POLLER: Poller[0] WARNING: There are '1' detected as overrunning a polling process, please investigate" what is that?
P.P.S Cacti runs on UBUNTU server 11.10
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: SPINE: Poller[0] ERROR: Spine Timed Out While Processing

Post by gandalf »

The overrun issue is thrown in cases, when the poller does not complete within a poller interval (300 sec by default). This often occurs in case of unresponsive targets and/or insane timeouts (e.g. default timeouts) of commands used within scripts.
R.
kingaru1
Posts: 7
Joined: Thu Feb 02, 2012 8:38 am

Re: SPINE: Poller[0] ERROR: Spine Timed Out While Processing

Post by kingaru1 »

gandalf wrote:The overrun issue is thrown in cases, when the poller does not complete within a poller interval (300 sec by default). This often occurs in case of unresponsive targets and/or insane timeouts (e.g. default timeouts) of commands used within scripts.
R.
Thanks a lot for the feedback. At least someone is reacted at hopefully we can find a solution. Yes, besides general poling traffic of the interfaces via standard way, I also run some of my scripts to pull additional stats. I get all relies from ALL hosts without any problems. Id does not take more than 40 seconds to complete pooling for any of the devices that I have on the list. I.e. for some of the devices it maybe 10 seconds for some more remote 40 (absolute maximum) therefore if my pooling interval 60 seconds that should not be a problem. Correct? Is there a way that I can debug this? IN the log I see following as well

02/07/2012 02:01:00 PM - POLLER: Poller[0] WARNING: Poller Output Table not Empty. Issues Found: 3, Data Sources: aaaa(DS[3628]), bbb(DS[3628]), ccc(DS[3631])
02/07/2012 02:01:00 PM - POLLER: Poller[0] WARNING: There are '1' detected as overrunning a polling process, please investigate
02/07/2012 02:01:00 PM - POLLER: Poller[0] NOTE: Poller Int: '60', Cron Int: '60', Time Since Last: '59', Max Runtime '58', Poller Runs: '1'
02/07/2012 02:01:00 PM - SYSTEM STATS: Time:58.1966 Method:cmd.php Processes:10 Threads:N/A Hosts:41 HostsPerProcess:5 DataSources:357 RRDsProcessed:161
02/07/2012 02:01:00 PM - POLLER: Poller[0] Maximum runtime of 58 seconds exceeded. Exiting.

So what is the the runtime that I'm exceeded???? I did not exceeded this time for any of the hosts... or are you taking aggregated time for all hosts? If this is the case then I think it is wrong. pooler should be counted separately for each individual device not for all of them together ...

K
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: SPINE: Poller[0] ERROR: Spine Timed Out While Processing

Post by gandalf »

kingaru1 wrote: Id does not take more than 40 seconds to complete pooling for any of the devices that I have on the list. I.e. for some of the devices it maybe 10 seconds for some more remote 40 (absolute maximum) therefore if my pooling interval 60 seconds that should not be a problem. Correct?
In general, this is not correct. In fact, you can't exactly determine, which data sources are polled by which polling process. If I remember correctly, we try to poll a single host by exactly one polling process. If it happens, that two "slow" devices are polled by the same process, polling will time out. And in fact, a response time of 40 sec. is really sloooow, especially when trying to use 60 sec polling interval.
You may want to use multi-threading for that host (personally, I never used that).
You may run spine in debug level 3 to get polling times for each host. This may help for debugging
R.
kingaru1
Posts: 7
Joined: Thu Feb 02, 2012 8:38 am

Re: SPINE: Poller[0] ERROR: Spine Timed Out While Processing

Post by kingaru1 »

gandalf wrote:
kingaru1 wrote: Id does not take more than 40 seconds to complete pooling for any of the devices that I have on the list. I.e. for some of the devices it maybe 10 seconds for some more remote 40 (absolute maximum) therefore if my pooling interval 60 seconds that should not be a problem. Correct?
In general, this is not correct. In fact, you can't exactly determine, which data sources are polled by which polling process. If I remember correctly, we try to poll a single host by exactly one polling process. If it happens, that two "slow" devices are polled by the same process, polling will time out. And in fact, a response time of 40 sec. is really sloooow, especially when trying to use 60 sec polling interval.
You may want to use multi-threading for that host (personally, I never used that).
You may run spine in debug level 3 to get polling times for each host. This may help for debugging
R.

OK ... it took me a while to get back to this issue but I made some tests and found that there is no a single host thread that takes more than a 50-55 seconds to complete.
I have 155 hosts configured in cacti and each host runs 6 threads. In some worse cases it takes 50-52 seconds for some of the threads to completed , so I _never_ exceed 60 seconds pooling time for any running [TH] (thread). I assume cacti must empty Output table for the each thread up on completion it's pooling tasks, however I feel that it happens only when the entire pooling process (that includes several threads) for the host is completed which may be explaining the issue. I also I have noticed that the each thread includes several "poling" items. Due to different network conditions or network device utilization the responses for the each individual item in the thread can be delayed therefore it slows down the whole thread and makes processing of other items from the same thread to be delayed as well. Yes, this may not be a big issue if you perform mooring of devices within a 40ms range with limited number of SNMP objects that you control however as soon as you will require to monitor devices which are 150+ ms away from you and you need to monitor quite few interfaces and/or SNMP objects you can easily exceed 40 or even 60 sec for the single device which I think is nothing abnormal as soon as you don't exceed 60 seconds interval for an individual SNMP object.

Regards
Igor
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: SPINE: Poller[0] ERROR: Spine Timed Out While Processing

Post by gandalf »

There are some polling timeouts that you may change under "Settings". Does this help you?
R.
kingaru1
Posts: 7
Joined: Thu Feb 02, 2012 8:38 am

Re: SPINE: Poller[0] ERROR: Spine Timed Out While Processing

Post by kingaru1 »

gandalf wrote:There are some polling timeouts that you may change under "Settings". Does this help you?
R.

I have script and server timeout values set for 120 and it does not help :(

Regards
Igor
kingaru1
Posts: 7
Joined: Thu Feb 02, 2012 8:38 am

Re: SPINE: Poller[0] ERROR: Spine Timed Out While Processing

Post by kingaru1 »

kingaru1 wrote:
gandalf wrote:There are some polling timeouts that you may change under "Settings". Does this help you?
R.

I have script and server timeout values set for 120 and it does not help :(

Regards
Igor
So I guess we ran out of any ideas how to resolve this since there are no any other suggestions ? :(

Regards
Igor
dar1k
Posts: 5
Joined: Fri Sep 21, 2012 2:46 am

Re: SPINE: Poller[0] ERROR: Spine Timed Out While Processing

Post by dar1k »

I have the same problem. My spine works unstable. every 5-10 minute I got timeout error:

crontab 5 min.
poller interval 5 min.
Maximum Concurrent Poller Processes 4
Maximum Threads per Process 40
Number of PHP Script Servers 10
Script and Script Server Timeout Value 25
The Maximum SNMP OID's Per SNMP Get Request 40

11/20/2012 02:20:34 PM - SYSTEM THOLD STATS: Time:0.0878 Tholds:39 DownHosts:0
11/20/2012 02:20:34 PM - SYSTEM STATS: Time:32.0892 Method:spine Processes:4 Threads:40 Hosts:1294 HostsPerProcess:324 DataSources:1618 RRDsProcessed:1280
11/20/2012 02:18:00 PM - SYSTEM THOLD STATS: Time:0.0881 Tholds:39 DownHosts:0
11/20/2012 02:18:00 PM - SYSTEM STATS: Time:178.8975 Method:spine Processes:4 Threads:40 Hosts:1294 HostsPerProcess:324 DataSources:1618 RRDsProcessed:1280
11/20/2012 02:10:33 PM - SYSTEM THOLD STATS: Time:0.0872 Tholds:39 DownHosts:0
11/20/2012 02:10:33 PM - SYSTEM STATS: Time:32.0221 Method:spine Processes:4 Threads:40 Hosts:1294 HostsPerProcess:324 DataSources:1618 RRDsProcessed:1280

11/20/2012 03:00:27 PM - SYSTEM THOLD STATS: Time:0.0166 Tholds:0 DownHosts:0
11/20/2012 03:00:27 PM - SYSTEM STATS: Time:26.1608 Method:spine Processes:4 Threads:40 Hosts:1298 HostsPerProcess:325 DataSources:0 RRDsProcessed:0
11/20/2012 03:00:01 PM - SPINE: Poller[0] ERROR: Spine Timed Out While Processing Hosts Internal
11/20/2012 03:00:01 PM - POLLER: Poller[0] WARNING: There are '1' detected as overrunning a polling process, please investigate
11/20/2012 03:00:00 PM - SYSTEM THOLD STATS: Time:0.0167 Tholds:0 DownHosts:0
11/20/2012 03:00:00 PM - SYSTEM STATS: Time:298.5146 Method:spine Processes:4 Threads:40 Hosts:1298 HostsPerProcess:325 DataSources:0 RRDsProcessed:0
11/20/2012 03:00:00 PM - POLLER: Poller[0] Maximum runtime of 298 seconds exceeded. Exiting.
11/20/2012 02:50:33 PM - SYSTEM THOLD STATS: Time:0.1463 Tholds:39 DownHosts:0
11/20/2012 02:50:33 PM - SYSTEM STATS: Time:31.7913 Method:spine Processes:4 Threads:40 Hosts:1294 HostsPerProcess:324 DataSources:1618 RRDsProcessed:1280
11/20/2012 02:45:33 PM - SYSTEM THOLD STATS: Time:0.0884 Tholds:39 DownHosts:0
11/20/2012 02:45:33 PM - SYSTEM STATS: Time:31.7094 Method:spine Processes:4 Threads:40 Hosts:1294 HostsPerProcess:324 DataSources:1618 RRDsProcessed:1280

Why such big difference in runtime?
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: SPINE: Poller[0] ERROR: Spine Timed Out While Processing

Post by gandalf »

This looks like two pollers are running in parallel. Examine your crontabs, please (see e.g. 2nd link of my sig)
R.
dar1k
Posts: 5
Joined: Fri Sep 21, 2012 2:46 am

Re: SPINE: Poller[0] ERROR: Spine Timed Out While Processing

Post by dar1k »

I have only one crontab:

Code: Select all

[root@cactus ~]# cat /etc/cron.d/cacti
*/5 * * * *     cactiuser       /usr/bin/php -q /var/www/cacti/poller.php > /var/www/cacti/log/poller.log 2>&1
[root@cactus ~]#
paulgevers
Cacti Pro User
Posts: 613
Joined: Tue Aug 29, 2006 4:09 pm
Location: NL

Re: SPINE: Poller[0] ERROR: Spine Timed Out While Processing

Post by paulgevers »

dar1k wrote:I have only one crontab:

Code: Select all

[root@cactus ~]# cat /etc/cron.d/cacti
*/5 * * * *     cactiuser       /usr/bin/php -q /var/www/cacti/poller.php > /var/www/cacti/log/poller.log 2>&1
[root@cactus ~]#
And maybe one in /etc/crontab ?
Maintainer of cacti in Debian (and Ubuntu).
Cacti 1.* is now officially supported on Debian Stretch via Debian backports
FAQ Ubuntu and Debian differences
Generic cacti debugging
dar1k
Posts: 5
Joined: Fri Sep 21, 2012 2:46 am

Re: SPINE: Poller[0] ERROR: Spine Timed Out While Processing

Post by dar1k »

paulgevers wrote:
dar1k wrote:I have only one crontab:

Code: Select all

[root@cactus ~]# cat /etc/cron.d/cacti
*/5 * * * *     cactiuser       /usr/bin/php -q /var/www/cacti/poller.php > /var/www/cacti/log/poller.log 2>&1
[root@cactus ~]#
And maybe one in /etc/crontab ?

Code: Select all

[root@cactus ~]# cat /etc/crontab
SHELL=/bin/bash
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=root
HOME=/

# For details see man 4 crontabs

# Example of job definition:
# .---------------- minute (0 - 59)
# |  .------------- hour (0 - 23)
# |  |  .---------- day of month (1 - 31)
# |  |  |  .------- month (1 - 12) OR jan,feb,mar,apr ...
# |  |  |  |  .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat
# |  |  |  |  |
# *  *  *  *  * user-name command to be executed

[root@cactus ~]#
I've changed poller type from spine to cmd.php, but cmd did not detect down hosts if:
1. Downed Device Detection is icmp ping
2. Downed Device Detection is ping or snmp uptime and host response only snmp it detects as down.
kingaru1
Posts: 7
Joined: Thu Feb 02, 2012 8:38 am

Re: SPINE: Poller[0] ERROR: Spine Timed Out While Processing

Post by kingaru1 »

dar1k wrote:I have the same problem. My spine works unstable. every 5-10 minute I got timeout error:

crontab 5 min.
poller interval 5 min.
Maximum Concurrent Poller Processes 4
Maximum Threads per Process 40
Number of PHP Script Servers 10
Script and Script Server Timeout Value 25
The Maximum SNMP OID's Per SNMP Get Request 40

11/20/2012 02:20:34 PM - SYSTEM THOLD STATS: Time:0.0878 Tholds:39 DownHosts:0
11/20/2012 02:20:34 PM - SYSTEM STATS: Time:32.0892 Method:spine Processes:4 Threads:40 Hosts:1294 HostsPerProcess:324 DataSources:1618 RRDsProcessed:1280
11/20/2012 02:18:00 PM - SYSTEM THOLD STATS: Time:0.0881 Tholds:39 DownHosts:0
11/20/2012 02:18:00 PM - SYSTEM STATS: Time:178.8975 Method:spine Processes:4 Threads:40 Hosts:1294 HostsPerProcess:324 DataSources:1618 RRDsProcessed:1280
11/20/2012 02:10:33 PM - SYSTEM THOLD STATS: Time:0.0872 Tholds:39 DownHosts:0
11/20/2012 02:10:33 PM - SYSTEM STATS: Time:32.0221 Method:spine Processes:4 Threads:40 Hosts:1294 HostsPerProcess:324 DataSources:1618 RRDsProcessed:1280

11/20/2012 03:00:27 PM - SYSTEM THOLD STATS: Time:0.0166 Tholds:0 DownHosts:0
11/20/2012 03:00:27 PM - SYSTEM STATS: Time:26.1608 Method:spine Processes:4 Threads:40 Hosts:1298 HostsPerProcess:325 DataSources:0 RRDsProcessed:0
11/20/2012 03:00:01 PM - SPINE: Poller[0] ERROR: Spine Timed Out While Processing Hosts Internal
11/20/2012 03:00:01 PM - POLLER: Poller[0] WARNING: There are '1' detected as overrunning a polling process, please investigate
11/20/2012 03:00:00 PM - SYSTEM THOLD STATS: Time:0.0167 Tholds:0 DownHosts:0
11/20/2012 03:00:00 PM - SYSTEM STATS: Time:298.5146 Method:spine Processes:4 Threads:40 Hosts:1298 HostsPerProcess:325 DataSources:0 RRDsProcessed:0
11/20/2012 03:00:00 PM - POLLER: Poller[0] Maximum runtime of 298 seconds exceeded. Exiting.
11/20/2012 02:50:33 PM - SYSTEM THOLD STATS: Time:0.1463 Tholds:39 DownHosts:0
11/20/2012 02:50:33 PM - SYSTEM STATS: Time:31.7913 Method:spine Processes:4 Threads:40 Hosts:1294 HostsPerProcess:324 DataSources:1618 RRDsProcessed:1280
11/20/2012 02:45:33 PM - SYSTEM THOLD STATS: Time:0.0884 Tholds:39 DownHosts:0
11/20/2012 02:45:33 PM - SYSTEM STATS: Time:31.7094 Method:spine Processes:4 Threads:40 Hosts:1294 HostsPerProcess:324 DataSources:1618 RRDsProcessed:1280

Why such big difference in runtime?
Try to reduce number of processes to default 1 ...

Igor
Post Reply

Who is online

Users browsing this forum: No registered users and 2 guests