SPINE: Poller[0] ERROR: Spine Timed Out While Processing Hos
Moderators: Developers, Moderators
SPINE: Poller[0] ERROR: Spine Timed Out While Processing Hos
Hello,
Well ... running latest version Cacti + latest spine.
having great issues with graphs and rrd not being created for required hosts and or graphs being broken.
It looks like the issue arises when I add more than certain number of items for monitoring.
Cron set for 1 min. Cron interval and Grpah intervals set for 1 min as well.
Keep getting constantly "SPINE: Poller[0] ERROR: Spine Timed Out While Processing Hosts Internal " in the log
Can somebody please help with this?
Thanks
K.
Well ... running latest version Cacti + latest spine.
having great issues with graphs and rrd not being created for required hosts and or graphs being broken.
It looks like the issue arises when I add more than certain number of items for monitoring.
Cron set for 1 min. Cron interval and Grpah intervals set for 1 min as well.
Keep getting constantly "SPINE: Poller[0] ERROR: Spine Timed Out While Processing Hosts Internal " in the log
Can somebody please help with this?
Thanks
K.
Re: SPINE: Poller[0] ERROR: Spine Timed Out While Processing
C'mon guys ... I don't believe that I'm so unique in the issue ...
Why collects data and does not draw graph or stops creating rrd files at the certain poin!?
Also I dont think this is the issue comes from Spine ... switching to cmd.php does not fix the problem but just transforms error in to a warning "POLLER: Poller[0] WARNING: Poller Output Table not Empty. Issues Found: 8, Data Sources: blah blah blah "
Could you please at least let me know at least where to dig in? I searched through the posts regarding similar issue but all issues/fixes going back to 2005-2006 ... several releases back ... where all issues were caused by some sort bugs and should be fixed unfortunately it looks like not all of them fixed therefore I need help to help you and myself to fix it ...
Regards
K.
P.S. Just come up in the log " POLLER: Poller[0] WARNING: There are '1' detected as overrunning a polling process, please investigate" what is that?
P.P.S Cacti runs on UBUNTU server 11.10
Why collects data and does not draw graph or stops creating rrd files at the certain poin!?
Also I dont think this is the issue comes from Spine ... switching to cmd.php does not fix the problem but just transforms error in to a warning "POLLER: Poller[0] WARNING: Poller Output Table not Empty. Issues Found: 8, Data Sources: blah blah blah "
Could you please at least let me know at least where to dig in? I searched through the posts regarding similar issue but all issues/fixes going back to 2005-2006 ... several releases back ... where all issues were caused by some sort bugs and should be fixed unfortunately it looks like not all of them fixed therefore I need help to help you and myself to fix it ...
Regards
K.
P.S. Just come up in the log " POLLER: Poller[0] WARNING: There are '1' detected as overrunning a polling process, please investigate" what is that?
P.P.S Cacti runs on UBUNTU server 11.10
- gandalf
- Developer
- Posts: 22383
- Joined: Thu Dec 02, 2004 2:46 am
- Location: Muenster, Germany
- Contact:
Re: SPINE: Poller[0] ERROR: Spine Timed Out While Processing
The overrun issue is thrown in cases, when the poller does not complete within a poller interval (300 sec by default). This often occurs in case of unresponsive targets and/or insane timeouts (e.g. default timeouts) of commands used within scripts.
R.
R.
Re: SPINE: Poller[0] ERROR: Spine Timed Out While Processing
Thanks a lot for the feedback. At least someone is reacted at hopefully we can find a solution. Yes, besides general poling traffic of the interfaces via standard way, I also run some of my scripts to pull additional stats. I get all relies from ALL hosts without any problems. Id does not take more than 40 seconds to complete pooling for any of the devices that I have on the list. I.e. for some of the devices it maybe 10 seconds for some more remote 40 (absolute maximum) therefore if my pooling interval 60 seconds that should not be a problem. Correct? Is there a way that I can debug this? IN the log I see following as wellgandalf wrote:The overrun issue is thrown in cases, when the poller does not complete within a poller interval (300 sec by default). This often occurs in case of unresponsive targets and/or insane timeouts (e.g. default timeouts) of commands used within scripts.
R.
02/07/2012 02:01:00 PM - POLLER: Poller[0] WARNING: Poller Output Table not Empty. Issues Found: 3, Data Sources: aaaa(DS[3628]), bbb(DS[3628]), ccc(DS[3631])
02/07/2012 02:01:00 PM - POLLER: Poller[0] WARNING: There are '1' detected as overrunning a polling process, please investigate
02/07/2012 02:01:00 PM - POLLER: Poller[0] NOTE: Poller Int: '60', Cron Int: '60', Time Since Last: '59', Max Runtime '58', Poller Runs: '1'
02/07/2012 02:01:00 PM - SYSTEM STATS: Time:58.1966 Method:cmd.php Processes:10 Threads:N/A Hosts:41 HostsPerProcess:5 DataSources:357 RRDsProcessed:161
02/07/2012 02:01:00 PM - POLLER: Poller[0] Maximum runtime of 58 seconds exceeded. Exiting.
So what is the the runtime that I'm exceeded???? I did not exceeded this time for any of the hosts... or are you taking aggregated time for all hosts? If this is the case then I think it is wrong. pooler should be counted separately for each individual device not for all of them together ...
K
- gandalf
- Developer
- Posts: 22383
- Joined: Thu Dec 02, 2004 2:46 am
- Location: Muenster, Germany
- Contact:
Re: SPINE: Poller[0] ERROR: Spine Timed Out While Processing
In general, this is not correct. In fact, you can't exactly determine, which data sources are polled by which polling process. If I remember correctly, we try to poll a single host by exactly one polling process. If it happens, that two "slow" devices are polled by the same process, polling will time out. And in fact, a response time of 40 sec. is really sloooow, especially when trying to use 60 sec polling interval.kingaru1 wrote: Id does not take more than 40 seconds to complete pooling for any of the devices that I have on the list. I.e. for some of the devices it maybe 10 seconds for some more remote 40 (absolute maximum) therefore if my pooling interval 60 seconds that should not be a problem. Correct?
You may want to use multi-threading for that host (personally, I never used that).
You may run spine in debug level 3 to get polling times for each host. This may help for debugging
R.
Re: SPINE: Poller[0] ERROR: Spine Timed Out While Processing
gandalf wrote:In general, this is not correct. In fact, you can't exactly determine, which data sources are polled by which polling process. If I remember correctly, we try to poll a single host by exactly one polling process. If it happens, that two "slow" devices are polled by the same process, polling will time out. And in fact, a response time of 40 sec. is really sloooow, especially when trying to use 60 sec polling interval.kingaru1 wrote: Id does not take more than 40 seconds to complete pooling for any of the devices that I have on the list. I.e. for some of the devices it maybe 10 seconds for some more remote 40 (absolute maximum) therefore if my pooling interval 60 seconds that should not be a problem. Correct?
You may want to use multi-threading for that host (personally, I never used that).
You may run spine in debug level 3 to get polling times for each host. This may help for debugging
R.
OK ... it took me a while to get back to this issue but I made some tests and found that there is no a single host thread that takes more than a 50-55 seconds to complete.
I have 155 hosts configured in cacti and each host runs 6 threads. In some worse cases it takes 50-52 seconds for some of the threads to completed , so I _never_ exceed 60 seconds pooling time for any running [TH] (thread). I assume cacti must empty Output table for the each thread up on completion it's pooling tasks, however I feel that it happens only when the entire pooling process (that includes several threads) for the host is completed which may be explaining the issue. I also I have noticed that the each thread includes several "poling" items. Due to different network conditions or network device utilization the responses for the each individual item in the thread can be delayed therefore it slows down the whole thread and makes processing of other items from the same thread to be delayed as well. Yes, this may not be a big issue if you perform mooring of devices within a 40ms range with limited number of SNMP objects that you control however as soon as you will require to monitor devices which are 150+ ms away from you and you need to monitor quite few interfaces and/or SNMP objects you can easily exceed 40 or even 60 sec for the single device which I think is nothing abnormal as soon as you don't exceed 60 seconds interval for an individual SNMP object.
Regards
Igor
- gandalf
- Developer
- Posts: 22383
- Joined: Thu Dec 02, 2004 2:46 am
- Location: Muenster, Germany
- Contact:
Re: SPINE: Poller[0] ERROR: Spine Timed Out While Processing
There are some polling timeouts that you may change under "Settings". Does this help you?
R.
R.
Re: SPINE: Poller[0] ERROR: Spine Timed Out While Processing
gandalf wrote:There are some polling timeouts that you may change under "Settings". Does this help you?
R.
I have script and server timeout values set for 120 and it does not help
Regards
Igor
Re: SPINE: Poller[0] ERROR: Spine Timed Out While Processing
So I guess we ran out of any ideas how to resolve this since there are no any other suggestions ?kingaru1 wrote:gandalf wrote:There are some polling timeouts that you may change under "Settings". Does this help you?
R.
I have script and server timeout values set for 120 and it does not help
Regards
Igor
Regards
Igor
Re: SPINE: Poller[0] ERROR: Spine Timed Out While Processing
I have the same problem. My spine works unstable. every 5-10 minute I got timeout error:
crontab 5 min.
poller interval 5 min.
Maximum Concurrent Poller Processes 4
Maximum Threads per Process 40
Number of PHP Script Servers 10
Script and Script Server Timeout Value 25
The Maximum SNMP OID's Per SNMP Get Request 40
11/20/2012 02:20:34 PM - SYSTEM THOLD STATS: Time:0.0878 Tholds:39 DownHosts:0
11/20/2012 02:20:34 PM - SYSTEM STATS: Time:32.0892 Method:spine Processes:4 Threads:40 Hosts:1294 HostsPerProcess:324 DataSources:1618 RRDsProcessed:1280
11/20/2012 02:18:00 PM - SYSTEM THOLD STATS: Time:0.0881 Tholds:39 DownHosts:0
11/20/2012 02:18:00 PM - SYSTEM STATS: Time:178.8975 Method:spine Processes:4 Threads:40 Hosts:1294 HostsPerProcess:324 DataSources:1618 RRDsProcessed:1280
11/20/2012 02:10:33 PM - SYSTEM THOLD STATS: Time:0.0872 Tholds:39 DownHosts:0
11/20/2012 02:10:33 PM - SYSTEM STATS: Time:32.0221 Method:spine Processes:4 Threads:40 Hosts:1294 HostsPerProcess:324 DataSources:1618 RRDsProcessed:1280
11/20/2012 03:00:27 PM - SYSTEM THOLD STATS: Time:0.0166 Tholds:0 DownHosts:0
11/20/2012 03:00:27 PM - SYSTEM STATS: Time:26.1608 Method:spine Processes:4 Threads:40 Hosts:1298 HostsPerProcess:325 DataSources:0 RRDsProcessed:0
11/20/2012 03:00:01 PM - SPINE: Poller[0] ERROR: Spine Timed Out While Processing Hosts Internal
11/20/2012 03:00:01 PM - POLLER: Poller[0] WARNING: There are '1' detected as overrunning a polling process, please investigate
11/20/2012 03:00:00 PM - SYSTEM THOLD STATS: Time:0.0167 Tholds:0 DownHosts:0
11/20/2012 03:00:00 PM - SYSTEM STATS: Time:298.5146 Method:spine Processes:4 Threads:40 Hosts:1298 HostsPerProcess:325 DataSources:0 RRDsProcessed:0
11/20/2012 03:00:00 PM - POLLER: Poller[0] Maximum runtime of 298 seconds exceeded. Exiting.
11/20/2012 02:50:33 PM - SYSTEM THOLD STATS: Time:0.1463 Tholds:39 DownHosts:0
11/20/2012 02:50:33 PM - SYSTEM STATS: Time:31.7913 Method:spine Processes:4 Threads:40 Hosts:1294 HostsPerProcess:324 DataSources:1618 RRDsProcessed:1280
11/20/2012 02:45:33 PM - SYSTEM THOLD STATS: Time:0.0884 Tholds:39 DownHosts:0
11/20/2012 02:45:33 PM - SYSTEM STATS: Time:31.7094 Method:spine Processes:4 Threads:40 Hosts:1294 HostsPerProcess:324 DataSources:1618 RRDsProcessed:1280
Why such big difference in runtime?
crontab 5 min.
poller interval 5 min.
Maximum Concurrent Poller Processes 4
Maximum Threads per Process 40
Number of PHP Script Servers 10
Script and Script Server Timeout Value 25
The Maximum SNMP OID's Per SNMP Get Request 40
11/20/2012 02:20:34 PM - SYSTEM THOLD STATS: Time:0.0878 Tholds:39 DownHosts:0
11/20/2012 02:20:34 PM - SYSTEM STATS: Time:32.0892 Method:spine Processes:4 Threads:40 Hosts:1294 HostsPerProcess:324 DataSources:1618 RRDsProcessed:1280
11/20/2012 02:18:00 PM - SYSTEM THOLD STATS: Time:0.0881 Tholds:39 DownHosts:0
11/20/2012 02:18:00 PM - SYSTEM STATS: Time:178.8975 Method:spine Processes:4 Threads:40 Hosts:1294 HostsPerProcess:324 DataSources:1618 RRDsProcessed:1280
11/20/2012 02:10:33 PM - SYSTEM THOLD STATS: Time:0.0872 Tholds:39 DownHosts:0
11/20/2012 02:10:33 PM - SYSTEM STATS: Time:32.0221 Method:spine Processes:4 Threads:40 Hosts:1294 HostsPerProcess:324 DataSources:1618 RRDsProcessed:1280
11/20/2012 03:00:27 PM - SYSTEM THOLD STATS: Time:0.0166 Tholds:0 DownHosts:0
11/20/2012 03:00:27 PM - SYSTEM STATS: Time:26.1608 Method:spine Processes:4 Threads:40 Hosts:1298 HostsPerProcess:325 DataSources:0 RRDsProcessed:0
11/20/2012 03:00:01 PM - SPINE: Poller[0] ERROR: Spine Timed Out While Processing Hosts Internal
11/20/2012 03:00:01 PM - POLLER: Poller[0] WARNING: There are '1' detected as overrunning a polling process, please investigate
11/20/2012 03:00:00 PM - SYSTEM THOLD STATS: Time:0.0167 Tholds:0 DownHosts:0
11/20/2012 03:00:00 PM - SYSTEM STATS: Time:298.5146 Method:spine Processes:4 Threads:40 Hosts:1298 HostsPerProcess:325 DataSources:0 RRDsProcessed:0
11/20/2012 03:00:00 PM - POLLER: Poller[0] Maximum runtime of 298 seconds exceeded. Exiting.
11/20/2012 02:50:33 PM - SYSTEM THOLD STATS: Time:0.1463 Tholds:39 DownHosts:0
11/20/2012 02:50:33 PM - SYSTEM STATS: Time:31.7913 Method:spine Processes:4 Threads:40 Hosts:1294 HostsPerProcess:324 DataSources:1618 RRDsProcessed:1280
11/20/2012 02:45:33 PM - SYSTEM THOLD STATS: Time:0.0884 Tholds:39 DownHosts:0
11/20/2012 02:45:33 PM - SYSTEM STATS: Time:31.7094 Method:spine Processes:4 Threads:40 Hosts:1294 HostsPerProcess:324 DataSources:1618 RRDsProcessed:1280
Why such big difference in runtime?
- gandalf
- Developer
- Posts: 22383
- Joined: Thu Dec 02, 2004 2:46 am
- Location: Muenster, Germany
- Contact:
Re: SPINE: Poller[0] ERROR: Spine Timed Out While Processing
This looks like two pollers are running in parallel. Examine your crontabs, please (see e.g. 2nd link of my sig)
R.
R.
Re: SPINE: Poller[0] ERROR: Spine Timed Out While Processing
I have only one crontab:
Code: Select all
[root@cactus ~]# cat /etc/cron.d/cacti
*/5 * * * * cactiuser /usr/bin/php -q /var/www/cacti/poller.php > /var/www/cacti/log/poller.log 2>&1
[root@cactus ~]#
-
- Cacti Pro User
- Posts: 613
- Joined: Tue Aug 29, 2006 4:09 pm
- Location: NL
Re: SPINE: Poller[0] ERROR: Spine Timed Out While Processing
And maybe one in /etc/crontab ?dar1k wrote:I have only one crontab:Code: Select all
[root@cactus ~]# cat /etc/cron.d/cacti */5 * * * * cactiuser /usr/bin/php -q /var/www/cacti/poller.php > /var/www/cacti/log/poller.log 2>&1 [root@cactus ~]#
Maintainer of cacti in Debian (and Ubuntu).
Cacti 1.* is now officially supported on Debian Stretch via Debian backports
FAQ Ubuntu and Debian differences
Generic cacti debugging
Cacti 1.* is now officially supported on Debian Stretch via Debian backports
FAQ Ubuntu and Debian differences
Generic cacti debugging
Re: SPINE: Poller[0] ERROR: Spine Timed Out While Processing
paulgevers wrote:And maybe one in /etc/crontab ?dar1k wrote:I have only one crontab:Code: Select all
[root@cactus ~]# cat /etc/cron.d/cacti */5 * * * * cactiuser /usr/bin/php -q /var/www/cacti/poller.php > /var/www/cacti/log/poller.log 2>&1 [root@cactus ~]#
Code: Select all
[root@cactus ~]# cat /etc/crontab
SHELL=/bin/bash
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=root
HOME=/
# For details see man 4 crontabs
# Example of job definition:
# .---------------- minute (0 - 59)
# | .------------- hour (0 - 23)
# | | .---------- day of month (1 - 31)
# | | | .------- month (1 - 12) OR jan,feb,mar,apr ...
# | | | | .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat
# | | | | |
# * * * * * user-name command to be executed
[root@cactus ~]#
1. Downed Device Detection is icmp ping
2. Downed Device Detection is ping or snmp uptime and host response only snmp it detects as down.
Re: SPINE: Poller[0] ERROR: Spine Timed Out While Processing
Try to reduce number of processes to default 1 ...dar1k wrote:I have the same problem. My spine works unstable. every 5-10 minute I got timeout error:
crontab 5 min.
poller interval 5 min.
Maximum Concurrent Poller Processes 4
Maximum Threads per Process 40
Number of PHP Script Servers 10
Script and Script Server Timeout Value 25
The Maximum SNMP OID's Per SNMP Get Request 40
11/20/2012 02:20:34 PM - SYSTEM THOLD STATS: Time:0.0878 Tholds:39 DownHosts:0
11/20/2012 02:20:34 PM - SYSTEM STATS: Time:32.0892 Method:spine Processes:4 Threads:40 Hosts:1294 HostsPerProcess:324 DataSources:1618 RRDsProcessed:1280
11/20/2012 02:18:00 PM - SYSTEM THOLD STATS: Time:0.0881 Tholds:39 DownHosts:0
11/20/2012 02:18:00 PM - SYSTEM STATS: Time:178.8975 Method:spine Processes:4 Threads:40 Hosts:1294 HostsPerProcess:324 DataSources:1618 RRDsProcessed:1280
11/20/2012 02:10:33 PM - SYSTEM THOLD STATS: Time:0.0872 Tholds:39 DownHosts:0
11/20/2012 02:10:33 PM - SYSTEM STATS: Time:32.0221 Method:spine Processes:4 Threads:40 Hosts:1294 HostsPerProcess:324 DataSources:1618 RRDsProcessed:1280
11/20/2012 03:00:27 PM - SYSTEM THOLD STATS: Time:0.0166 Tholds:0 DownHosts:0
11/20/2012 03:00:27 PM - SYSTEM STATS: Time:26.1608 Method:spine Processes:4 Threads:40 Hosts:1298 HostsPerProcess:325 DataSources:0 RRDsProcessed:0
11/20/2012 03:00:01 PM - SPINE: Poller[0] ERROR: Spine Timed Out While Processing Hosts Internal
11/20/2012 03:00:01 PM - POLLER: Poller[0] WARNING: There are '1' detected as overrunning a polling process, please investigate
11/20/2012 03:00:00 PM - SYSTEM THOLD STATS: Time:0.0167 Tholds:0 DownHosts:0
11/20/2012 03:00:00 PM - SYSTEM STATS: Time:298.5146 Method:spine Processes:4 Threads:40 Hosts:1298 HostsPerProcess:325 DataSources:0 RRDsProcessed:0
11/20/2012 03:00:00 PM - POLLER: Poller[0] Maximum runtime of 298 seconds exceeded. Exiting.
11/20/2012 02:50:33 PM - SYSTEM THOLD STATS: Time:0.1463 Tholds:39 DownHosts:0
11/20/2012 02:50:33 PM - SYSTEM STATS: Time:31.7913 Method:spine Processes:4 Threads:40 Hosts:1294 HostsPerProcess:324 DataSources:1618 RRDsProcessed:1280
11/20/2012 02:45:33 PM - SYSTEM THOLD STATS: Time:0.0884 Tholds:39 DownHosts:0
11/20/2012 02:45:33 PM - SYSTEM STATS: Time:31.7094 Method:spine Processes:4 Threads:40 Hosts:1294 HostsPerProcess:324 DataSources:1618 RRDsProcessed:1280
Why such big difference in runtime?
Igor
Who is online
Users browsing this forum: No registered users and 2 guests