Slow Script/Command execution

taliz · Post by **taliz** » Sun Jul 19, 2009 9:05 am

Hi there,

I have been implementing cacti at my job and its becoming increasingly important for our operation.
Lately I have created some graphing for a couple of our SAN's, where I'm using check_nrpe(a nagios program) to poll them and get data.
It seems to work fine and check_nrpe gets the data within a second. However, and here's the real question, it seems that Cacti only runs one check_nrpe at a time, and I cannot understand why. Since it needs to run check_nrpe a few times for each SAN it now takes up 2/3's of the total Cacti polling time. This is a problem because we are running with a 1 minute poller interval, and there are thousands of more items we need to graph. I have tried increasing poller processes & spine threads(we're using spine) but to no help.
The machine we use, which is dedicated to Cacti, is a 8x 2,66ghz with 4gb ram and 8x 146gb 10krpm sas drives in raid10. The load is negligible, between 0 and 1, hence there are no performance issues with the Cacti host. The OS is Centos 5.3 32-bit.
I have attached relevant parts of "Tech support" and parts of the cacti.log can be found at http://taliz.rtfm.se/cactiout.log(I couldn't attach it here for some reason).

Any pointers and hints much appreciated.

taliz · Post by **taliz** » Sun Jul 19, 2009 6:43 pm

I have also noticed that lately I have been getting large gaps at night time when the backups run. Extremely frustrating, to say the least.

Is there anything one can do about this? It looks like everything breaks because one or two hosts it polls get a bit lagged.

"07/20/2009 01:41:57 AM - SPINE: Poller[0] WARNING: SS[0] The PHP Script Server did not respond in time and will therefore be restarted" <- broken?

Tried upping threads but that didnt help, it started working again when backups were done.
I suppose the only thing you can do is try to find which host it is that is lagged and disable that one?

taliz · Post by **taliz** » Mon Jul 20, 2009 2:24 pm

I looked over our scripts and templates and noticed that we're still using temysql that isnt using the php script server, so I'm going to try and swap to the one thewitness converted.

I'm also going to try and write myself a ss_ script to run check_nrpe through, and see if that will make it spawn more of them.

taliz · Post by **taliz** » Mon Jul 20, 2009 3:45 pm

Running check_nrpe through an ss script didnt help at all, it still only executes one at a time.

Output:
07/20/2009 10:43:36 PM - SPINE: Poller[0] Host[191] DS[5703] SS[5] SERVER: /usr/share/cacti/scripts/ss_eva.php ss_eva <censored>.150 FP4 a, output: controller:a portname:FP4 readreqps:0 readmbps:0.00 readlatencyms:0.0 writereqps:0 writembps:0.00 writelatencyms:0.0
07/20/2009 10:43:36 PM - SPINE: Poller[0] Host[188] DS[5697] SS[4] SERVER: /usr/share/cacti/scripts/ss_eva.php ss_eva <censored>.100 FP2 a, output: controller:a portname:FP2 readreqps:14 readmbps:0.90 readlatencyms:2.6 writereqps:13 writembps:0.92 writelatencyms:0.9
07/20/2009 10:43:37 PM - SPINE: Poller[0] Host[191] DS[5708] SS[6] SERVER: /usr/share/cacti/scripts/ss_eva.php ss_eva <censored>.150 FP1 b, output: controller:b portname:FP1 readreqps:0 readmbps:0.00 readlatencyms:0.0 writereqps:44 writembps:0.36 writelatencyms:0.2
07/20/2009 10:43:38 PM - SPINE: Poller[0] Host[188] DS[5704] SS[7] SERVER: /usr/share/cacti/scripts/ss_eva.php ss_eva <censored>.100 FP1 b, output: controller:b portname:FP1 readreqps:0 readmbps:0.00 readlatencyms:0.0 writereqps:0 writembps:0.00 writelatencyms:0.0

taliz · Post by **taliz** » Mon Jul 20, 2009 4:17 pm

Is there anyone alive in this forum at all?

Lt_Flash · Post by **Lt_Flash** » Mon Jul 20, 2009 4:25 pm

Have you tried to increase max execution time in php.ini? Set it from 30 seconds to higher value.

taliz · Post by **taliz** » Mon Jul 20, 2009 4:33 pm

Lt_Flash wrote:Have you tried to increase max execution time in php.ini? Set it from 30 seconds to higher value.

Thanks for your reply, although I don't see how that would make it execute the scripts simultanously?

taliz · Post by **taliz** » Tue Jul 21, 2009 8:22 am

Perhaps I should have named this thread "parallel processing issue".

taliz · Post by **taliz** » Tue Jul 21, 2009 1:45 pm

FWIW, I'm looking into using snmptools to poll the SAN appliance server with snmp instead of check_nrpe. If that works it should be a lot faster. It doesn't solve the parallel processing problem, but it would be a workaround.

Post by **gandalf** » Thu Aug 27, 2009 2:25 pm

taliz wrote:FWIW, I'm looking into using snmptools to poll the SAN appliance server with snmp instead of check_nrpe. If that works it should be a lot faster. It doesn't solve the parallel processing problem, but it would be a workaround.

It's not a workaround but the recommended solution.
The "parallel processing" thingy may be a valid issue, but nrpe will be magnitudes slower compared to spine/snmp processing
As a workaround, you may want to pay attention to nrpe script timeout
Reinhard

Post by **Linegod** » Fri Aug 28, 2009 1:39 am

Since I'm the king of evil hacks....

Have you considered running the check_npre as a cron, writing the output, and using a script to grab that data? Ugly evil hack, but occasionally useful....

Slow Script/Command execution

Slow Script/Command execution

Who is online