Spine still running after poller exits

Post support questions that directly relate to Linux/Unix operating systems.

Moderators: Developers, Moderators

Post Reply
adamski
Posts: 5
Joined: Tue May 17, 2011 10:50 am

Spine still running after poller exits

Post by adamski »

Hi,
I've recently been trying to get spine working as a replacement for the standard cmd.php on an AIX environment with Cacti.
I have managed to compile the latest version of spine 0.8.7g, to run with 0.8.7e of Cacti.

Spine itself seems to work fine and is able to poll devices significantly faster than cmd.php does, however it seems that the majority of the time, poller.php exits before spine has completed... The strange thing is that it doesnt look like it crashes, as it outputs the "SYSTEM STATS" line in the cacti.log file. This then means that the results table doesnt get cleared out, and not all RRDs get updated.

Here's an example of spine continuing after poller, just after I have switched from using cmd.php. As you can see the number of RRDs updated is less than with cmd.php, and if I leave spine running for several iterations, the number of processes RRDs continues to decrease.

RRDsProcessed declining:

Code: Select all

05/17/2011 01:59:43 PM - SYSTEM STATS: Time:23.4199 Method:spine Processes:20 Threads:10 Hosts:71 HostsPerProcess:4 DataSources:3993 RRDsProcessed:1742
05/17/2011 02:04:45 PM - SYSTEM STATS: Time:25.9708 Method:spine Processes:20 Threads:10 Hosts:71 HostsPerProcess:4 DataSources:3993 RRDsProcessed:1779
05/17/2011 02:09:35 PM - SYSTEM STATS: Time:16.7437 Method:spine Processes:20 Threads:10 Hosts:71 HostsPerProcess:4 DataSources:3993 RRDsProcessed:1314
Spine running after poller has finished (SYSTEM STATS appears, but 2 seconds after spine continues to collect stats, meaning poller_output_empty.php has to be run):

Code: Select all

05/17/2011 10:24:26 AM - SPINE: Poller[0] Host[190] TH[1] DS[3570] SNMP: v1: XXXXXXXXXXXXXX, dsname: memXXee, oid: 1.3.6.1.4.1.6848.2.3.2.1.3.13, value: 46784
05/17/2011 10:24:26 AM - SPINE: Poller[0] Host[190] TH[1] DS[3570] SNMP: v1: XXXXXXXXXXXXXX, dsname: swapXXee, oid: 1.3.6.1.4.1.6848.2.3.3.1.3.13, value: 1003896
05/17/2011 10:24:30 AM - SYSTEM STATS: Time:10.5139 Method:spine Processes:10 Threads:25 Hosts:73 HostsPerProcess:8 DataSources:4055 RRDsProcessed:1630
05/17/2011 10:24:32 AM - SPINE: Poller[0] Host[189] TH[1] DS[3331] SNMP: v1: XXXXXXXXXXXXXX, dsname: cpuLoXX, oid: 1.3.6.1.4.1.6848.2.3.1.4.1, value: 900
05/17/2011 10:24:32 AM - SPINE: Poller[0] Host[189] TH[1] DS[3332] SNMP: v1: XXXXXXXXXXXXXX, dsname: cpuLoXX, oid: 1.3.6.1.4.1.6848.2.3.1.4.2, value: 900
05/17/2011 10:24:32 AM - SPINE: Poller[0] Host[189] TH[1] DS[3333] SNMP: v1: XXXXXXXXXXXXXX, dsname: cpuLoXX, oid: 1.3.6.1.4.1.6848.2.3.1.4.5, value: 101
05/17/2011 10:24:32 AM - SPINE: Poller[0] Host[189] TH[1] DS[3334] SNMP: v1: XXXXXXXXXXXXXX, dsname: cpuLoXX, oid: 1.3.6.1.4.1.6848.2.3.1.4.6, value: 100
05/17/2011 10:24:32 AM - SPINE: Poller[0] Host[189] TH[1] DS[3335] SNMP: v1: XXXXXXXXXXXXXX, dsname: cpuLoXX, oid: 1.3.6.1.4.1.6848.2.3.1.4.7, value: 0
05/17/2011 10:24:32 AM - SPINE: Poller[0] Host[189] TH[1] DS[3336] SNMP: v1: XXXXXXXXXXXXXX, dsname: cpuLoXX, oid: 1.3.6.1.4.1.6848.2.3.1.4.13, value: 2
05/17/2011 10:24:32 AM - SPINE: Poller[0] Host[189] TH[1] DS[3337] SNMP: v1: XXXXXXXXXXXXXX, dsname: cpuLoXX, oid: 1.3.6.1.4.1.6848.2.3.1.4.14, value: 0
I've taken a look at a few threads in the forums already, but can't seem to find anything that matches the problem I'm having.

Thanks

Adam
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: Spine still running after poller exits

Post by gandalf »

Please grep for SYSTEM STATS and tail last 10 lines. Post them
R.
adamski
Posts: 5
Joined: Tue May 17, 2011 10:50 am

Re: Spine still running after poller exits

Post by adamski »

Hi,
Here's the output from the cacti.log file. At the moment I've reverted to just cmd.php.

Code: Select all

# grep 'SYSTEM STATS' /weblogs/apache/cacti.log | tail -10
05/23/2011 08:42:35 AM - SYSTEM STATS: Time:75.0273 Method:cmd.php Processes:20 Threads:N/A Hosts:71 HostsPerProcess:4 DataSources:3993 RRDsProcessed:1818
05/23/2011 08:47:23 AM - SYSTEM STATS: Time:63.0846 Method:cmd.php Processes:20 Threads:N/A Hosts:71 HostsPerProcess:4 DataSources:3993 RRDsProcessed:1818
05/23/2011 08:52:38 AM - SYSTEM STATS: Time:78.6813 Method:cmd.php Processes:20 Threads:N/A Hosts:71 HostsPerProcess:4 DataSources:3993 RRDsProcessed:1818
05/23/2011 08:57:26 AM - SYSTEM STATS: Time:66.8272 Method:cmd.php Processes:20 Threads:N/A Hosts:71 HostsPerProcess:4 DataSources:3993 RRDsProcessed:1818
05/23/2011 09:02:33 AM - SYSTEM STATS: Time:70.3538 Method:cmd.php Processes:20 Threads:N/A Hosts:71 HostsPerProcess:4 DataSources:3993 RRDsProcessed:1818
05/23/2011 09:07:31 AM - SYSTEM STATS: Time:72.4698 Method:cmd.php Processes:20 Threads:N/A Hosts:71 HostsPerProcess:4 DataSources:3993 RRDsProcessed:1818
05/23/2011 09:12:36 AM - SYSTEM STATS: Time:76.7454 Method:cmd.php Processes:20 Threads:N/A Hosts:71 HostsPerProcess:4 DataSources:3993 RRDsProcessed:1818
05/23/2011 09:17:38 AM - SYSTEM STATS: Time:79.1249 Method:cmd.php Processes:20 Threads:N/A Hosts:71 HostsPerProcess:4 DataSources:3993 RRDsProcessed:1818
05/23/2011 09:22:45 AM - SYSTEM STATS: Time:84.4075 Method:cmd.php Processes:20 Threads:N/A Hosts:71 HostsPerProcess:4 DataSources:3993 RRDsProcessed:1818
05/23/2011 09:27:29 AM - SYSTEM STATS: Time:70.2270 Method:cmd.php Processes:20 Threads:N/A Hosts:71 HostsPerProcess:4 DataSources:3993 RRDsProcessed:1818
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: Spine still running after poller exits

Post by gandalf »

That looks fine. Timestamps match the poller execution time
R.
adamski
Posts: 5
Joined: Tue May 17, 2011 10:50 am

Re: Spine still running after poller exits

Post by adamski »

cmd.php is fine, yes. It's spine that is having the issues when it's turned on. Spine is disabled at the moment so the only logs are those that I posted previously.
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: Spine still running after poller exits

Post by gandalf »

Running spine manually does return what? (NOT running spine through poller.php but purely manually)
R.
adamski
Posts: 5
Joined: Tue May 17, 2011 10:50 am

Re: Spine still running after poller exits

Post by adamski »

I've recompiled spine, and updated to 0.8.7h.

Running spine manually doesn't return any errors.

Code: Select all

# ./spine -R  -S | more
NOTE: Spine did not detect multithreaded device polling.
NOTE: Spine is behaving in a 0.8.7g manner
Host[31] TH[1] NOTE: There are '93' Polling Items for this Host
Host[33] TH[1] NOTE: There are '91' Polling Items for this Host
Host[32] TH[1] NOTE: There are '158' Polling Items for this Host
Host[36] TH[1] NOTE: There are '57' Polling Items for this Host
Host[35] TH[1] NOTE: There are '83' Polling Items for this Host
Host[38] TH[1] NOTE: There are '71' Polling Items for this Host
Host[34] TH[1] NOTE: There are '83' Polling Items for this Host
Host[73] TH[1] NOTE: There are '42' Polling Items for this Host
... etc
It's always the same host that seems to be lagging behind after the poller process has completed.


If you look at the last few runs on the poller with spine enabled, you can see the RRDsProcessed varying a lot. However when cmd.php is enabled, it's always 1619, as it seems to wait for all the reponses to come back before exiting.

Code: Select all

10/21/2011 02:40:35 PM - SYSTEM STATS: Time:76.0756 Method:cmd.php Processes:15 Threads:N/A Hosts:70 HostsPerProcess:5 DataSources:3766 RRDsProcessed:1619
10/21/2011 02:45:45 PM - SYSTEM STATS: Time:84.1565 Method:cmd.php Processes:15 Threads:N/A Hosts:70 HostsPerProcess:5 DataSources:3766 RRDsProcessed:1619
10/21/2011 02:50:40 PM - SYSTEM STATS: Time:76.1229 Method:cmd.php Processes:15 Threads:N/A Hosts:70 HostsPerProcess:5 DataSources:3766 RRDsProcessed:1619
10/21/2011 02:55:35 PM - SYSTEM STATS: Time:74.5412 Method:cmd.php Processes:15 Threads:N/A Hosts:70 HostsPerProcess:5 DataSources:3766 RRDsProcessed:1619
10/21/2011 03:00:47 PM - SYSTEM STATS: Time:88.9817 Method:cmd.php Processes:15 Threads:N/A Hosts:70 HostsPerProcess:5 DataSources:3766 RRDsProcessed:1619
10/21/2011 03:05:59 PM - SYSTEM STATS: Time:90.9373 Method:cmd.php Processes:15 Threads:N/A Hosts:70 HostsPerProcess:5 DataSources:3766 RRDsProcessed:1619
10/21/2011 03:09:27 PM - SYSTEM STATS: Time:8.0323 Method:spine Processes:15 Threads:5 Hosts:70 HostsPerProcess:5 DataSources:3766 RRDsProcessed:1517
10/21/2011 03:14:32 PM - SYSTEM STATS: Time:11.3575 Method:spine Processes:15 Threads:5 Hosts:70 HostsPerProcess:5 DataSources:3766 RRDsProcessed:1396
10/21/2011 03:19:37 PM - SYSTEM STATS: Time:16.1022 Method:spine Processes:15 Threads:6 Hosts:70 HostsPerProcess:5 DataSources:3766 RRDsProcessed:1395
10/21/2011 03:24:44 PM - SYSTEM STATS: Time:20.8543 Method:spine Processes:15 Threads:6 Hosts:70 HostsPerProcess:5 DataSources:3766 RRDsProcessed:1469
10/21/2011 03:29:53 PM - SYSTEM STATS: Time:32.1076 Method:spine Processes:15 Threads:6 Hosts:70 HostsPerProcess:5 DataSources:3766 RRDsProcessed:1533
10/21/2011 03:34:49 PM - SYSTEM STATS: Time:19.1693 Method:spine Processes:15 Threads:6 Hosts:70 HostsPerProcess:5 DataSources:3766 RRDsProcessed:1404
10/21/2011 03:39:44 PM - SYSTEM STATS: Time:22.5317 Method:spine Processes:15 Threads:6 Hosts:70 HostsPerProcess:5 DataSources:3766 RRDsProcessed:1619
10/21/2011 03:44:42 PM - SYSTEM STATS: Time:21.4538 Method:spine Processes:15 Threads:10 Hosts:70 HostsPerProcess:5 DataSources:3766 RRDsProcessed:1472
I can run poller_output_empty.php to empty out the results before each run of the poller. But this isn't ideal.
User avatar
TheWitness
Developer
Posts: 17007
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Re: Spine still running after poller exits

Post by TheWitness »

Your doing way too much parallelization. You need to tone it down. You may be running out of connections.
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
adamski
Posts: 5
Joined: Tue May 17, 2011 10:50 am

Re: Spine still running after poller exits

Post by adamski »

Thanks for that.
I tried turning it down to 2 processes and 2 threads.

It then seemed to complete in less than a second, and updated 0 RRDs... However, again in the log, you can see it continued to process hosts, for the next 30 seconds.

Code: Select all

10/25/2011 11:43:56 AM - SYSTEM STATS: Time:28.5780 Method:spine Processes:15 Threads:10 Hosts:70 HostsPerProcess:5 DataSources:3766 RRDsProcessed:1447
10/25/2011 11:48:24 AM - POLLER: Poller[0] NOTE: Poller Int: '300', Cron Int: '300', Time Since Last: '297', Max Runtime '298', Poller Runs: '1'
10/25/2011 11:48:24 AM - SPINE: Poller[0] NOTE: Spine did not detect multithreaded device polling.
10/25/2011 11:48:24 AM - SPINE: Poller[0] NOTE: Spine is behaving in a 0.8.7g manner
10/25/2011 11:48:24 AM - SPINE: Poller[0] NOTE: Spine did not detect multithreaded device polling.
10/25/2011 11:48:24 AM - SPINE: Poller[0] NOTE: Spine is behaving in a 0.8.7g manner
10/25/2011 11:48:24 AM - SPINE: Poller[0] Host[31] TH[1] NOTE: There are '93' Polling Items for this Host
10/25/2011 11:48:24 AM - SPINE: Poller[0] Host[32] TH[1] NOTE: There are '158' Polling Items for this Host
10/25/2011 11:48:24 AM - SYSTEM STATS: Time:0.2583 Method:spine Processes:2 Threads:2 Hosts:70 HostsPerProcess:35 DataSources:3766 RRDsProcessed:0
10/25/2011 11:48:24 AM - SPINE: Poller[0] Host[131] TH[1] NOTE: There are '39' Polling Items for this Host
10/25/2011 11:48:24 AM - SPINE: Poller[0] Host[32] TH[1] DS[431] SNMP: v2: XXXXX, dsname: connections, oid: .1.3.6.1.4.1.3375.2.1.1.2.1.8.0, value: 5369
10/25/2011 11:48:24 AM - SPINE: Poller[0] Host[134] TH[1] NOTE: There are '6' Polling Items for this Host
10/25/2011 11:48:24 AM - SPINE: Poller[0] Host[32] TH[1] DS[432] SNMP: v2: XXXXX, dsname: http_get_req, oid: .1.3.6.1.4.1.3375.2.1.1.2.4.8.0, value: 14507064435
10/25/2011 11:48:24 AM - SPINE: Poller[0] Host[32] TH[1] DS[433] SNMP: v2: XXXXX, dsname: http_post_req, oid: .1.3.6.1.4.1.3375.2.1.1.2.4.9.0, value: 179580537
10/25/2011 11:48:24 AM - SPINE: Poller[0] Host[32] TH[1] DS[434] SNMP: v2: XXXXX, dsname: http_requests, oid: .1.3.6.1.4.1.3375.2.1.1.2.4.7.0, value: 14718037196
10/25/2011 11:48:24 AM - SPINE: Poller[0] Host[32] TH[1] DS[435] SNMP: v2: XXXXX, dsname: http_v10_req, oid: .1.3.6.1.4.1.3375.2.1.1.2.4.11.0, value: 175303924
10/25/2011 11:48:24 AM - SPINE: Poller[0] Host[32] TH[1] DS[436] SNMP: v2: XXXXX, dsname: http_v11_req, oid: .1.3.6.1.4.1.3375.2.1.1.2.4.12.0, value: 14542710888
10/25/2011 11:48:24 AM - SPINE: Poller[0] Host[32] TH[1] DS[437] SNMP: v2: XXXXX, dsname: http_v9_req, oid: .1.3.6.1.4.1.3375.2.1.1.2.4.10.0, value: 22321
10/25/2011 11:48:24 AM - SPINE: Poller[0] Host[32] TH[1] DS[438] SNMP: v2: XXXXX, dsname: http_4xx_resp, oid: .1.3.6.1.4.1.3375.2.1.1.2.4.5.0, value: 74628478
...
10/25/2011 11:48:35 AM - SPINE: Poller[0] Host[75] TH[1] DS[2092] SNMP: v2: XXXXX, dsname: traffic_in, oid: .1.3.6.1.2.1.2.2.1.10.4, value: 580343838
10/25/2011 11:48:35 AM - SPINE: Poller[0] Host[75] TH[1] DS[2092] SNMP: v2: XXXXX, dsname: traffic_out, oid: .1.3.6.1.2.1.2.2.1.16.4, value: 2894536285
10/25/2011 11:48:35 AM - SPINE: Poller[0] Host[76] TH[1] NOTE: There are '40' Polling Items for this Host
10/25/2011 11:48:35 AM - SPINE: Poller[0] Host[75] TH[1] DS[2093] SNMP: v2: XXXXX, dsname: traffic_in, oid: .1.3.6.1.2.1.2.2.1.10.5, value: 402966421
10/25/2011 11:48:35 AM - SPINE: Poller[0] Host[75] TH[1] DS[2093] SNMP: v2: XXXXX, dsname: traffic_out, oid: .1.3.6.1.2.1.2.2.1.16.5, value: 0
...
10/25/2011 11:48:52 AM - SPINE: Poller[0] Host[190] TH[1] DS[3569] SNMP: v1: XXXXX, dsname: swapUsed, oid: 1.3.6.1.4.1.6848.2.3.3.1.2.5, value: 0
10/25/2011 11:48:52 AM - SPINE: Poller[0] Host[190] TH[1] DS[3570] SNMP: v1: XXXXX, dsname: swapUsed, oid: 1.3.6.1.4.1.6848.2.3.3.1.2.13, value: 0
10/25/2011 11:48:52 AM - SPINE: Poller[0] Host[190] TH[1] DS[3570] SNMP: v1: XXXXX, dsname: memFree, oid: 1.3.6.1.4.1.6848.2.3.2.1.3.13, value: 68496
10/25/2011 11:48:52 AM - SPINE: Poller[0] Host[190] TH[1] DS[3570] SNMP: v1: XXXXX, dsname: swapFree, oid: 1.3.6.1.4.1.6848.2.3.3.1.3.13, value: 1003896
10/25/2011 11:48:52 AM - SPINE: Poller[0] Host[190] TH[1] DS[3570] SNMP: v1: XXXXX, dsname: memUsed, oid: 1.3.6.1.4.1.6848.2.3.2.1.2.13, value: 958032
10/25/2011 11:48:52 AM - SPINE: Poller[0] Time: 28.4958 s, Threads: 2, Hosts: 45
I even tried turning it down to 1 process and only 2 threads, but got a similar result:

Code: Select all

10/25/2011 11:53:53 AM - SYSTEM STATS: Time:1.5432 Method:spine Processes:1 Threads:2 Hosts:70 HostsPerProcess:70 DataSources:3766 RRDsProcessed:63
Also tried 2 processes with 1 thread each:

Code: Select all

10/25/2011 12:03:26 PM - SYSTEM STATS: Time:0.2496 Method:spine Processes:2 Threads:1 Hosts:70 HostsPerProcess:35 DataSources:3766 RRDsProcessed:0
User avatar
TheWitness
Developer
Posts: 17007
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Re: Spine still running after poller exits

Post by TheWitness »

Not too sure what you did. You should verify that it's running well from the command line. Also, make sure your ID has the correct permissions.
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Post Reply

Who is online

Users browsing this forum: No registered users and 4 guests