Spine 0.8.7.h hangs

Post general support questions here that do not specifically fall into the Linux or Windows categories.

Moderators: Developers, Moderators

Post Reply
r_kleineisel
Posts: 9
Joined: Wed Jan 26, 2011 4:14 am

Spine 0.8.7.h hangs

Post by r_kleineisel »

Hi,

I upgraded my cacti server to 0.8.7.h yesterday. Now I cannot use spine anymore because spine never finishes. Before the upgrade spine ran fine.

When I use cmd.php as poller everything works fine.

There are no plugins installed, I even uninstalled the PIA.

My setup is:

CentOS 5.7
Cacti 0.8.7.h
Spine 0.8.7h

I poll every 5 minutes.

Running spine on the command line looks good:
# spine -R
SPINE: Using spine config file [/etc/spine.conf]
SPINE: Version 0.8.7h starting
12/09/2011 12:19:11 PM - SPINE: Poller[0] Host[291] TH[1] DS[2739] WARNING: SNMP timeout detected [1000 ms], ignoring host '10.141.85.13'
SPINE: Time: 9.9505 s, Threads: 8, Hosts: 445

If I activate spine as the poller in cacti I get one additional spine process every 5 minutes. They never die or get killed. I have to switch back to cmd.php and kill the spine processes manually.

When I run spine with logging set to "debug" I get this:

12/09/2011 11:45:00 AM - SYSTEM STATS: Time:298.8417 Method:spine Processes:1 Threads:8 Hosts:445 HostsPerProcess:445 DataSources:6120 RRDsProcessed:1295
12/09/2011 11:45:01 AM - POLLER: Poller[0] NOTE: Poller Int: '300', Cron Int: '300', Time Since Last: '300', Max Runtime '298', Poller Runs: '1'
12/09/2011 11:45:01 AM - POLLER: Poller[0] WARNING: There are '1' detected as overrunning a polling process, please investigate
12/09/2011 11:45:01 AM - POLLER: Poller[0] DEBUG: About to Spawn a Remote Process [CMD: /usr/bin/spine, ARGS: 0 477]
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: The path_php variable is /usr/bin/php
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: The availability_method variable is 4
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: The ping_recovery_count variable is 1
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: The ping_failure_count variable is 2
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: The ping_method variable is 1
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: The ping_retries variable is 2
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: The ping_timeout variable is 800
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: The snmp_retries variable is 5
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: The log_perror variable is 1
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: The log_pwarn variable is 0
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: The boost_redirect variable is 0
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: The log_pstats variable is 0
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: The threads variable is 8
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: The polling interval is 300 seconds
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: The number of concurrent processes is 1
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: The script timeout is 25
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: The number of php script servers to run is 1
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: StartHost='0', EndHost='477', TotalPHPScripts='1'
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: The PHP Script Server is Required
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: The Maximum SNMP OID Get Size is 30
12/09/2011 11:45:01 AM - SPINE: Poller[0] Version 0.8.7h starting
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: MySQL is Thread Safe!
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: Spine has cap_net_raw capability.
12/09/2011 11:45:01 AM - SPINE: Poller[0] SPINE: Initializing Net-SNMP API
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: Issues with SNMP Header Version information, assuming old version of Net-SNMP.
12/09/2011 11:45:01 AM - SPINE: Poller[0] SPINE: Initializing PHP Script Server(s)
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: SS[0] PHP Script Server Routine Starting
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: SS[0] PHP Script Server About to FORK Child Process
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: SS[0] PHP Script Server Child FORK Success
12/09/2011 11:45:01 AM - PHPSVR: Poller[0] DEBUG: SERVER: spine
12/09/2011 11:45:01 AM - PHPSVR: Poller[0] DEBUG: GETCWD: /usr/bin
12/09/2011 11:45:01 AM - PHPSVR: Poller[0] DEBUG: DIRNAM: /var/www/cacti
12/09/2011 11:45:01 AM - PHPSVR: Poller[0] DEBUG: FILENM: /var/www/cacti/script_server.php
12/09/2011 11:45:01 AM - PHPSVR: Poller[0] PHP Script Server has Started - Parent is spine
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: SS[0] Confirmed PHP Script Server running using readfd[8], writefd[7]
12/09/2011 11:45:01 AM - SPINE: Poller[0] NOTE: Spine will support multithread device polling.
12/09/2011 11:45:01 AM - SPINE: Poller[0] NOTE: Spine is behaving in a 0.8.7g+ manner
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: Initial Value of Active Threads is 0
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: Valid Thread to be Created
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: In Poller, About to Start Polling of Host
12/09/2011 11:45:01 AM - SPINE: Poller[0] Host[0] TH[1] Total Time: 0.0007 Seconds
12/09/2011 11:45:01 AM - SPINE: Poller[0] Host[0] TH[1] DEBUG: HOST COMPLETE: About to Exit Host Polling Thread Function
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 1
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: Valid Thread to be Created
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: In Poller, About to Start Polling of Host
12/09/2011 11:45:01 AM - SPINE: Poller[0] Host[1] DEBUG: UDP Host Alive, Try Count:1, Time:0.0469 ms
12/09/2011 11:45:01 AM - SPINE: Poller[0] Host[1] PING: Result UDP: Host is Alive
12/09/2011 11:45:01 AM - SPINE: Poller[0] Host[1] TH[1] Host has no information for recache.
12/09/2011 11:45:01 AM - SPINE: Poller[0] Host[1] TH[1] NOTE: There are '5' Polling Items for this Host
12/09/2011 11:45:01 AM - SPINE: Poller[0] Host[1] DEBUG: The NIFTY POPEN returned the following File Descriptor 9
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 2
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: In Poller, About to Start Polling of Host
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: Valid Thread to be Created
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 3
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: Valid Thread to be Created
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: In Poller, About to Start Polling of Host
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 4
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: Valid Thread to be Created
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: In Poller, About to Start Polling of Host
12/09/2011 11:45:01 AM - SPINE: Poller[0] Host[2] SNMP Result: Host responded to SNMP
12/09/2011 11:45:01 AM - SPINE: Poller[0] Host[4] DEBUG: UDP Host Alive, Try Count:1, Time:0.7520 ms
12/09/2011 11:45:01 AM - SPINE: Poller[0] Host[2] TH[1] Host has no information for recache.
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 5
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: Valid Thread to be Created
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: In Poller, About to Start Polling of Host
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 6
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: Valid Thread to be Created
12/09/2011 11:45:01 AM - SPINE: Poller[0] Host[2] TH[1] NOTE: There are '89' Polling Items for this Host
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: In Poller, About to Start Polling of Host
12/09/2011 11:45:01 AM - SPINE: Poller[0] Host[4] PING Result: UDP: Host is Alive
12/09/2011 11:45:01 AM - SPINE: Poller[0] Host[4] SNMP Result: Host responded to SNMP
12/09/2011 11:45:01 AM - SPINE: Poller[0] Host[4] TH[1] Host has no information for recache.
12/09/2011 11:45:01 AM - SPINE: Poller[0] Host[1] TH[1] DS[4] SCRIPT: perl /var/www/cacti/scripts/linux_memory.pl SwapFree:, output: 8385920
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 7
12/09/2011 11:45:01 AM - SPINE: Poller[0] Host[5] DEBUG: UDP Host Alive, Try Count:1, Time:0.4241 ms
12/09/2011 11:45:01 AM - SPINE: Poller[0] Host[1] DEBUG: The NIFTY POPEN returned the following File Descriptor 9
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: Valid Thread to be Created
12/09/2011 11:45:01 AM - SPINE: Poller[0] Host[5] PING Result: UDP: Host is Alive
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 8
12/09/2011 11:45:01 AM - SPINE: Poller[0] Host[5] SNMP Result: SNMP not performed due to setting or ping result
12/09/2011 11:45:01 AM - SPINE: Poller[0] DEBUG: In Poller, About to Start Polling of Host
12/09/2011 11:45:01 AM - SPINE: Poller[0] Host[2] TH[1] DS[3304] SNMP: v2: 10.9.7.3, dsname: traffic_in, oid: .1.3.6.1.2.1.31.1.1.1.6.38, value: 45958305488
12/09/2011 11:45:01 AM - SPINE: Poller[0] Host[2] TH[1] DS[3304] SNMP: v2: 10.9.7.3, dsname: traffic_out, oid: .1.3.6.1.2.1.31.1.1.1.10.38, value: 194881038883
12/09/2011 11:45:01 AM - SPINE: Poller[0] Host[2] TH[1] DS[3305] SNMP: v2: 10.9.7.3, dsname: traffic_in, oid: .1.3.6.1.2.1.31.1.1.1.6.39, value: 9471421424004
12/09/2011 11:45:01 AM - SPINE: Poller[0] Host[2] TH[1] DS[3305] SNMP: v2: 10.9.7.3, dsname: traffic_out, oid: .1.3.6.1.2.1.31.1.1.1.10.39, value: 14817214545752
12/09/2011 11:45:01 AM - SPINE: Poller[0] Host[2] TH[1] DS[3306] SNMP: v2: 10.9.7.3, dsname: traffic_in, oid: .1.3.6.1.2.1.31.1.1.1.6.40, value: 995328332
12/09/2011 11:45:01 AM - SPINE: Poller[0] Host[5] TH[1] Host has no information for recache.
[lots of snmp and rrdtool logs snipped]
12/09/2011 11:45:02 AM - SPINE: Poller[0] Host[63] TH[1] Total Time: 0.018 Seconds
12/09/2011 11:45:02 AM - SPINE: Poller[0] Host[63] TH[1] DEBUG: HOST COMPLETE: About to Exit Host Polling Thread Function
12/09/2011 11:45:02 AM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 7
12/09/2011 11:45:02 AM - SPINE: Poller[0] DEBUG: Valid Thread to be Created
12/09/2011 11:45:02 AM - SPINE: Poller[0] DEBUG: In Poller, About to Start Polling of Host
12/09/2011 11:45:02 AM - POLLER: Poller[0] CACTI2RRD: /usr/bin/rrdtool update /var/www/cacti/rra/myswitchname_ping_823.rrd --template ping 1323427502:0.974
12/09/2011 11:45:02 AM - POLLER: Poller[0] CACTI2RRD: /usr/bin/rrdtool update /var/www/cacti/rra/myswitchname_5min_cpu_705.rrd --template 5min_cpu 1323427502:6
12/09/2011 11:45:02 AM - POLLER: Poller[0] CACTI2RRD: /usr/bin/rrdtool update /var/www/cacti/rra/myswitchname_traffic_in_712.rrd --template traffic_in:traffic_out 1323427502:0:0
12/09/2011 11:45:02 AM - POLLER: Poller[0] CACTI2RRD: /usr/bin/rrdtool update /var/www/cacti/rra/myswitchname_traffic_in_713.rrd --template traffic_in:traffic_out 1323427502:908598534950:240421222163
12/09/2011 11:45:02 AM - POLLER: Poller[0] CACTI2RRD: /usr/bin/rrdtool update /var/www/cacti/rra/myswitchname_traffic_in_714.rrd --template traffic_in:traffic_out 1323427502:291643497:21764430
12/09/2011 11:45:02 AM - POLLER: Poller[0] CACTI2RRD: /usr/bin/rrdtool update /var/www/cacti/rra/myswitchname_traffic_in_715.rrd --template traffic_in:traffic_out 1323427502:333853020097:737792713472
12/09/2011 11:45:02 AM - POLLER: Poller[0] CACTI2RRD: /usr/bin/rrdtool update /var/www/cacti/rra/myswitchname_traffic_in_716.rrd --template traffic_in:traffic_out 1323427502:5345274618:42252335671
12/09/2011 11:45:02 AM - POLLER: Poller[0] CACTI2RRD: /usr/bin/rrdtool update /var/www/cacti/rra/myswitchname_traffic_in_717.rrd --template traffic_in:traffic_out 1323427502:970191603260:1146526332717
12/09/2011 11:45:02 AM - POLLER: Poller[0] CACTI2RRD: /usr/bin/rrdtool update /var/www/cacti/rra/myswitchname_fanstate_897.rrd --template FanState 1323427502:1
12/09/2011 11:45:02 AM - POLLER: Poller[0] CACTI2RRD: /usr/bin/rrdtool update /var/www/cacti/rra/myswitchname_supplystate_898.rrd --template SupplyState 1323427502:1
12/09/2011 11:45:02 AM - POLLER: Poller[0] CACTI2RRD: /usr/bin/rrdtool update /var/www/cacti/rra/myswitchname_tempstatval_899.rrd --template TempStatVal:TempStatThr 1323427502:31:59
12/09/2011 11:45:02 AM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 8
12/09/2011 11:50:00 AM - POLLER: Poller[0] Maximum runtime of 298 seconds exceeded. Exiting.
12/09/2011 11:50:00 AM - SYSTEM STATS: Time:298.8373 Method:spine Processes:1 Threads:8 Hosts:445 HostsPerProcess:445 DataSources:6120 RRDsProcessed:1295
12/09/2011 11:50:01 AM - POLLER: Poller[0] NOTE: Poller Int: '300', Cron Int: '300', Time Since Last: '300', Max Runtime '298', Poller Runs: '1'
12/09/2011 11:50:01 AM - POLLER: Poller[0] WARNING: There are '1' detected as overrunning a polling process, please investigate
12/09/2011 11:50:01 AM - POLLER: Poller[0] DEBUG: About to Spawn a Remote Process [CMD: /usr/bin/spine, ARGS: 0 477]

Any ideas?
User avatar
TheWitness
Developer
Posts: 17047
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Re: Spine 0.8.7.h hangs

Post by TheWitness »

Grab the latest (branches/0.8.7) spine from SVN. It's currently frozen and awaiting the 'i' release over the weekend. What you download is the release version of spine087i. There was a problem introduced in 087h that caused the TCP/ICMP/UDP pinged hosts to perform excessive pings which made spine appear to hang. In addition, there was a new feature call Linux Capabilities that was not working quite right. So, if you attempted ICMP ping and did not have the binary setuid, spine would simply hang instead of falling back to another ping method.

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Post Reply

Who is online

Users browsing this forum: No registered users and 3 guests