Spine does not read PHP script server output correctly

Post general support questions here that do not specifically fall into the Linux or Windows categories.

Moderators: Developers, Moderators

Post Reply
X-dark
Cacti User
Posts: 111
Joined: Wed Jun 18, 2008 10:21 am
Location: France

Spine does not read PHP script server output correctly

Post by X-dark »

Hi,

I have a Cacti 0.8.7i installation running on a CentOS 5.5 (i686).

It seems Spine does not fetch correctly values output from PHP script server. They seems to be shifted (the output is not associated with the correct counter).

The following patch solves the problem but I don't understand exactly what is happening:

Code: Select all

--- php.c.orig  2012-03-23 17:25:36.000000000 +0100
+++ php.c       2012-03-23 17:02:31.000000000 +0100
@@ -408,8 +408,14 @@
                /* check pipe to insure startup took place */
                if (php_process == PHP_INIT) {
                        result_string = php_readpipe(i);
+                       if (!strstr(result_string, "Started") && (trim(result_string)[0] == '\0')) {
+                               result_string = php_readpipe(i);
+                       }
                }else{
                        result_string = php_readpipe(php_process);
+                        if (!strstr(result_string, "Started") && (trim(result_string)[0] == '\0')) {
+                                result_string = php_readpipe(php_process);
+                        }
                }

                if (strstr(result_string, "Started")) {
Here is part of the log with vanilla Spine:

Code: Select all

DEBUG: The log_destination variable is 4 (STDOUT)
DEBUG: The path_php variable is /usr/bin/php
DEBUG: The availability_method variable is 3
DEBUG: The ping_recovery_count variable is 2
DEBUG: The ping_failure_count variable is 2
DEBUG: The ping_method variable is 1
DEBUG: The ping_retries variable is 1
DEBUG: The ping_timeout variable is 450
DEBUG: The snmp_retries variable is 2
DEBUG: The log_perror variable is 1
DEBUG: The log_pwarn variable is 1
DEBUG: The boost_redirect variable is 0
DEBUG: The log_pstats variable is 1
DEBUG: The threads variable is 100
DEBUG: The polling interval is 300 seconds
DEBUG: The number of concurrent processes is 3
DEBUG: The script timeout is 5
DEBUG: The number of php script servers to run is 1
DEBUG: Host List to be polled='409', TotalPHPScripts='1
DEBUG: The PHP Script Server is Required
DEBUG: The Maximum SNMP OID Get Size is 40
Version 0.8.7i starting
DEBUG: MySQL is Thread Safe!
DEBUG: Spine is running asroot.
SPINE: Initializing Net-SNMP API
DEBUG: Issues with SNMP Header Version information, assuming old version of Net-SNMP.
SPINE: Initializing PHP Script Server(s)
DEBUG: SS[0] PHP Script Server Routine Starting
DEBUG: SS[0] PHP Script Server About to FORK Child Process
DEBUG: SS[0] PHP Script Server Child FORK Success
ERROR: SS[999] Script Server did not start properly return message was:
NOTE: Spine will support multithread device polling.
NOTE: Spine is behaving in a 0.8.7g+ manner
DEBUG: Initial Value of Active Threads is 0
DEBUG: Valid Thread to be Created
DEBUG: In Poller, About to Start Polling of Host
Host[0] TH[1] Total Time: 0.001 Seconds
Host[0] TH[1] DEBUG: HOST COMPLETE: About to Exit Host Polling Thread Function
DEBUG: The Value of Active Threads is 1
DEBUG: Valid Thread to be Created
DEBUG: In Poller, About to Start Polling of Host
Host[409] DEBUG: Entering ICMP Ping
DEBUG: The Value of Active Threads is 2
DEBUG: The Value of Active Threads is 1
Host[409] DEBUG: ICMP Host Alive, Try Count:1, Time:9.2531 ms
Host[409] PING: Result ICMP: Host is Alive
Host[409] TH[1] RECACHE: Processing 1 items in the auto reindex cache for '12.34.56.78
Host[409] TH[1] Recache DataQuery[12] OID: .1.3.6.1.2.1.1.3.0, output: 520921215
Host[409] TH[1] NOTE: There are '208' Polling Items for this Host
Host[409] TH[1] DS[5805] SS[0] SERVER: /cacti/scripts/get_interface3.php ss_get_interface 12.34.56.78 409 2:161:500:40:public get ifInOctets 1, output: U
Host[409] TH[1] DS[5805] SS[0] SERVER: /cacti/scripts/get_interface3.php ss_get_interface 12.34.56.78 409 2:161:500:40:public get ifOutOctets 1, output: 134153709
Host[409] TH[1] DS[5806] SS[0] SERVER: /cacti/scripts/get_interface3.php ss_get_interface 12.34.56.78 409 2:161:500:40:public get ifOutOctets 80, output: 128933806
Host[409] TH[1] DS[5806] SS[0] SERVER: /cacti/scripts/get_interface3.php ss_get_interface 12.34.56.78 409 2:161:500:40:public get ifInOctets 80, output: 151616406
Host[409] TH[1] DS[5807] SS[0] SERVER: /cacti/scripts/get_interface3.php ss_get_interface 12.34.56.78 409 2:161:500:40:public get ifInOctets 81, output: 118614662
Here is the corresponding part of the log with the change above:

Code: Select all

DEBUG: The log_destination variable is 4 (STDOUT)
DEBUG: The path_php variable is /usr/bin/php
DEBUG: The availability_method variable is 3
DEBUG: The ping_recovery_count variable is 2
DEBUG: The ping_failure_count variable is 2
DEBUG: The ping_method variable is 1
DEBUG: The ping_retries variable is 1
DEBUG: The ping_timeout variable is 450
DEBUG: The snmp_retries variable is 2
DEBUG: The log_perror variable is 1
DEBUG: The log_pwarn variable is 1
DEBUG: The boost_redirect variable is 0
DEBUG: The log_pstats variable is 1
DEBUG: The threads variable is 100
DEBUG: The polling interval is 300 seconds
DEBUG: The number of concurrent processes is 3
DEBUG: The script timeout is 5
DEBUG: The number of php script servers to run is 1
DEBUG: Host List to be polled='409', TotalPHPScripts='1
DEBUG: The PHP Script Server is Required
DEBUG: The Maximum SNMP OID Get Size is 40
Version 0.8.7i starting
DEBUG: MySQL is Thread Safe!
DEBUG: Spine is running asroot.
SPINE: Initializing Net-SNMP API
DEBUG: Issues with SNMP Header Version information, assuming old version of Net-SNMP.
SPINE: Initializing PHP Script Server(s)
DEBUG: SS[0] PHP Script Server Routine Starting
DEBUG: SS[0] PHP Script Server About to FORK Child Process
DEBUG: SS[0] PHP Script Server Child FORK Success
DEBUG: SS[0] Confirmed PHP Script Server running using readfd[7], writefd[6]
NOTE: Spine will support multithread device polling.
NOTE: Spine is behaving in a 0.8.7g+ manner
DEBUG: Initial Value of Active Threads is 0
DEBUG: Valid Thread to be Created
DEBUG: In Poller, About to Start Polling of Host
DEBUG: The Value of Active Threads is 1
DEBUG: Valid Thread to be Created
DEBUG: In Poller, About to Start Polling of Host
DEBUG: The Value of Active Threads is 2
Host[0] TH[1] Total Time: 0.0036 Seconds
Host[0] TH[1] DEBUG: HOST COMPLETE: About to Exit Host Polling Thread Function
DEBUG: The Value of Active Threads is 1
Host[409] DEBUG: Entering ICMP Ping
Host[409] DEBUG: ICMP Host Alive, Try Count:1, Time:9.9970 ms
Host[409] PING: Result ICMP: Host is Alive
Host[409] TH[1] RECACHE: Processing 1 items in the auto reindex cache for '12.34.56.78
Host[409] TH[1] Recache DataQuery[12] OID: .1.3.6.1.2.1.1.3.0, output: 520923820
Host[409] TH[1] NOTE: There are '208' Polling Items for this Host
Host[409] TH[1] DS[5805] SS[0] SERVER: /cacti/scripts/get_interface3.php ss_get_interface 12.34.56.78 409 2:161:500:40:public get ifInOctets 1, output: 134154327
Host[409] TH[1] DS[5805] SS[0] SERVER: /cacti/scripts/get_interface3.php ss_get_interface 12.34.56.78 409 2:161:500:40:public get ifOutOctets 1, output: 128934400
Host[409] TH[1] DS[5806] SS[0] SERVER: /cacti/scripts/get_interface3.php ss_get_interface 12.34.56.78 409 2:161:500:40:public get ifOutOctets 80, output: 151617064
Host[409] TH[1] DS[5806] SS[0] SERVER: /cacti/scripts/get_interface3.php ss_get_interface 12.34.56.78 409 2:161:500:40:public get ifInOctets 80, output: 118615220
Host[409] TH[1] DS[5807] SS[0] SERVER: /cacti/scripts/get_interface3.php ss_get_interface 12.34.56.78 409 2:161:500:40:public get ifInOctets 81, output: 118882146
As you can see in this example all output are shifted to the next counter.
Cedric Girard
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: Spine does not read PHP script server output correctly

Post by gandalf »

Which version of php was used in this case, please?
X-dark
Cacti User
Posts: 111
Joined: Wed Jun 18, 2008 10:21 am
Location: France

Re: Spine does not read PHP script server output correctly

Post by X-dark »

gandalf wrote:Which version of php was used in this case, please?
It's PHP 5.1.6.

This problem is not specific to Cacti 0.8.7i, I have been able to reproduce it with Cacti 0.8.7e.
Cedric Girard
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: Spine does not read PHP script server output correctly

Post by gandalf »

I'm sorry, but as soon as I found your post, I tried to reproduce on a php 5.3.10 to no avail == it works without issues here. So I fear the problem is more sophisticated ...
R.
Swordfish
Posts: 15
Joined: Tue Feb 21, 2006 9:00 pm
Location: Melbourne, Victoria, Australia

Re: Spine does not read PHP script server output correctly

Post by Swordfish »

Hi Gandalf,

Within the last week we have encountered the same issue with the script server as X-dark.

Most of our scripts are Perl, but we do use the built-in ss-based Data Queries:
"[SNMP - Get Mounted Partitions]" - 242 Graphs, and
"[SNMP - Get Processor Information]" - 66 Graphs.

This has been running fine for many years.

Code: Select all

QUAD core Intel(R) Xeon(R) CPU E5440  @ 2.83GHz
8GB RAM
Red Hat Enterprise Linux Server release 5.7 (Tikanga)
PHP 5.1.6-27.el5_5.3
Cacti Version 0.8.7h
NET-SNMP 5.x
RRD-TOOL 1.3.x
5 minute polling cycle using Spine.
Concurrent Poller Processes: 4
Balance Process Load:  yes
Maximum Threads per Process: 15
Number of PHP Script Servers: 2
Script and Script Server Timeout Value: 25
The Maximum SNMP OID's Per SNMP Get Request: 10

Total graphs: 79911
Total devices: 1876

Poller time: 166 seconds
Last week we added a new ss_script for Cisco CPU from:
http://www.eric-a-hall.com/software/cacti-cisco-cpu/
# ss_cisco_cpu_usage.php
# version 0.9e
# November 11, 2010
#
# Copyright (C) 2006-2010, Eric A. Hall
# http://www.eric-a-hall.com/
We have modified it to support the 5second CPU OID also - This was a trivial exercise.

After testing, all looked ok to proceed.
We created a new Host Template and changed 91 hosts to the new template.
See related topic: http://forums.cacti.net/viewtopic.php?f=21&t=47272

Then we used the add_graphs.php script to add the new graphs to all 91 hosts sequentially - 383 Graphs.
This process of adding the graphs took many hours as expected.

The graphs were all working fine until a certain number had been added, then all Script server based graphs showed odd results.

It has taken us a few days to narrow it down to the script server.
If we test the script in the script server (via CLI), the data returned seems to be correct and matched Wireshark captures of the SNMP gets.

However, under load of the normal Cacti polling cycle the data returned via the script server failed to match the Wireshark captures.
This is where the link to this thread comes in.
The Cacti log showed evidence that the values were moving to the next returned value.
This was not 100% consistent, but there was a common occurrence.

Renaming the ss_cisco_cpu_usage.php so that Cacti could not find it, thereby disabling the 383 new graphs, resolved the problem with all other script server based graphs.

I have checked the PHP ss_cisco_cpu_usage.php, and it looks ok, but I'm only a PHP tinkerer.
As part of our debugging, the php.ini MEM=512M was increased to MEM=1024M.
However checking the Processes, I don't see PHP using more than .01% RAM for any of the 8 script server processes.

Do you have any tips for debugging this further?
Or is there a debug version of script server?

Thanks Rob.

EDIT: attached 2 graph example pictures.
Attachments
CPU Usage
CPU Usage
2012-05-25_223935.jpg (91.52 KiB) Viewed 1500 times
Disk Usage
Disk Usage
2012-05-25_223836.jpg (87.89 KiB) Viewed 1500 times
Swordfish
Posts: 15
Joined: Tue Feb 21, 2006 9:00 pm
Location: Melbourne, Victoria, Australia

PHP script server output is corrupt under load

Post by Swordfish »

Hello,

In the last 3 weeks since I posted on this topic (with no comment from anyone), I have found a workaround.

Our Spine process were set to

Code: Select all

Concurrent Poller Processes: 4
Balance Process Load:  yes
Maximum Threads per Process: 15
Number of PHP Script Servers: 2 (per Spine Process)
For unrelated reasons we upgraded our ESX server.
The CPU and Disk performance improvement dropped our Spine Poller runtime in half.

We then decided to try try tweaking the processes and thread count for Spine and Script Server.

Various setting combinations gave somewhat better or worse results.
It wasn't until I set Script Server per Spine process to 1, did we get working graphs.
However to maintain the equivalent of 60 threads above, I set as follows:

Code: Select all

Concurrent Poller Processes: 60
Balance Process Load:  yes
Maximum Threads per Process: 1
Number of PHP Script Servers: 1 (per Spine Process)
This had the effect of doubling our poller run time.

Further tests showed that we could achieve close to our previous poller runtime with:

Code: Select all

Concurrent Poller Processes: 140
Balance Process Load:  yes
Maximum Threads per Process: 1
Number of PHP Script Servers: 1 (per Spine Process)
The hit on the server is that we have doubled our average process count to nearly 300, and the load average is quite high.
The CPU is not affected, but memory has taken a hit.

At this stage we are running ok, but this not acceptable for the long term.

In Summary: It seems that the Script Server is not thread safe - in that, using a Spine Threads value of more than 1 caused corrupt data being generated by the Script Server.

Hopefully someone has some insight to this issue.

::Fish::
Post Reply

Who is online

Users browsing this forum: No registered users and 2 guests