Slow Script/Command execution
Moderators: Developers, Moderators
Slow Script/Command execution
Hi there,
I have been implementing cacti at my job and its becoming increasingly important for our operation.
Lately I have created some graphing for a couple of our SAN's, where I'm using check_nrpe(a nagios program) to poll them and get data.
It seems to work fine and check_nrpe gets the data within a second. However, and here's the real question, it seems that Cacti only runs one check_nrpe at a time, and I cannot understand why. Since it needs to run check_nrpe a few times for each SAN it now takes up 2/3's of the total Cacti polling time. This is a problem because we are running with a 1 minute poller interval, and there are thousands of more items we need to graph. I have tried increasing poller processes & spine threads(we're using spine) but to no help.
The machine we use, which is dedicated to Cacti, is a 8x 2,66ghz with 4gb ram and 8x 146gb 10krpm sas drives in raid10. The load is negligible, between 0 and 1, hence there are no performance issues with the Cacti host. The OS is Centos 5.3 32-bit.
I have attached relevant parts of "Tech support" and parts of the cacti.log can be found at http://taliz.rtfm.se/cactiout.log(I couldn't attach it here for some reason).
Any pointers and hints much appreciated.
I have been implementing cacti at my job and its becoming increasingly important for our operation.
Lately I have created some graphing for a couple of our SAN's, where I'm using check_nrpe(a nagios program) to poll them and get data.
It seems to work fine and check_nrpe gets the data within a second. However, and here's the real question, it seems that Cacti only runs one check_nrpe at a time, and I cannot understand why. Since it needs to run check_nrpe a few times for each SAN it now takes up 2/3's of the total Cacti polling time. This is a problem because we are running with a 1 minute poller interval, and there are thousands of more items we need to graph. I have tried increasing poller processes & spine threads(we're using spine) but to no help.
The machine we use, which is dedicated to Cacti, is a 8x 2,66ghz with 4gb ram and 8x 146gb 10krpm sas drives in raid10. The load is negligible, between 0 and 1, hence there are no performance issues with the Cacti host. The OS is Centos 5.3 32-bit.
I have attached relevant parts of "Tech support" and parts of the cacti.log can be found at http://taliz.rtfm.se/cactiout.log(I couldn't attach it here for some reason).
Any pointers and hints much appreciated.
- Attachments
-
- cactitech.txt
- (4.55 KiB) Downloaded 130 times
I have also noticed that lately I have been getting large gaps at night time when the backups run. Extremely frustrating, to say the least.
Is there anything one can do about this? It looks like everything breaks because one or two hosts it polls get a bit lagged.
"07/20/2009 01:41:57 AM - SPINE: Poller[0] WARNING: SS[0] The PHP Script Server did not respond in time and will therefore be restarted" <- broken?
Tried upping threads but that didnt help, it started working again when backups were done.
I suppose the only thing you can do is try to find which host it is that is lagged and disable that one?
Is there anything one can do about this? It looks like everything breaks because one or two hosts it polls get a bit lagged.
"07/20/2009 01:41:57 AM - SPINE: Poller[0] WARNING: SS[0] The PHP Script Server did not respond in time and will therefore be restarted" <- broken?
Tried upping threads but that didnt help, it started working again when backups were done.
I suppose the only thing you can do is try to find which host it is that is lagged and disable that one?
I looked over our scripts and templates and noticed that we're still using temysql that isnt using the php script server, so I'm going to try and swap to the one thewitness converted.
I'm also going to try and write myself a ss_ script to run check_nrpe through, and see if that will make it spawn more of them.
I'm also going to try and write myself a ss_ script to run check_nrpe through, and see if that will make it spawn more of them.
Running check_nrpe through an ss script didnt help at all, it still only executes one at a time.
Output:
07/20/2009 10:43:36 PM - SPINE: Poller[0] Host[191] DS[5703] SS[5] SERVER: /usr/share/cacti/scripts/ss_eva.php ss_eva <censored>.150 FP4 a, output: controller:a portname:FP4 readreqps:0 readmbps:0.00 readlatencyms:0.0 writereqps:0 writembps:0.00 writelatencyms:0.0
07/20/2009 10:43:36 PM - SPINE: Poller[0] Host[188] DS[5697] SS[4] SERVER: /usr/share/cacti/scripts/ss_eva.php ss_eva <censored>.100 FP2 a, output: controller:a portname:FP2 readreqps:14 readmbps:0.90 readlatencyms:2.6 writereqps:13 writembps:0.92 writelatencyms:0.9
07/20/2009 10:43:37 PM - SPINE: Poller[0] Host[191] DS[5708] SS[6] SERVER: /usr/share/cacti/scripts/ss_eva.php ss_eva <censored>.150 FP1 b, output: controller:b portname:FP1 readreqps:0 readmbps:0.00 readlatencyms:0.0 writereqps:44 writembps:0.36 writelatencyms:0.2
07/20/2009 10:43:38 PM - SPINE: Poller[0] Host[188] DS[5704] SS[7] SERVER: /usr/share/cacti/scripts/ss_eva.php ss_eva <censored>.100 FP1 b, output: controller:b portname:FP1 readreqps:0 readmbps:0.00 readlatencyms:0.0 writereqps:0 writembps:0.00 writelatencyms:0.0
Output:
07/20/2009 10:43:36 PM - SPINE: Poller[0] Host[191] DS[5703] SS[5] SERVER: /usr/share/cacti/scripts/ss_eva.php ss_eva <censored>.150 FP4 a, output: controller:a portname:FP4 readreqps:0 readmbps:0.00 readlatencyms:0.0 writereqps:0 writembps:0.00 writelatencyms:0.0
07/20/2009 10:43:36 PM - SPINE: Poller[0] Host[188] DS[5697] SS[4] SERVER: /usr/share/cacti/scripts/ss_eva.php ss_eva <censored>.100 FP2 a, output: controller:a portname:FP2 readreqps:14 readmbps:0.90 readlatencyms:2.6 writereqps:13 writembps:0.92 writelatencyms:0.9
07/20/2009 10:43:37 PM - SPINE: Poller[0] Host[191] DS[5708] SS[6] SERVER: /usr/share/cacti/scripts/ss_eva.php ss_eva <censored>.150 FP1 b, output: controller:b portname:FP1 readreqps:0 readmbps:0.00 readlatencyms:0.0 writereqps:44 writembps:0.36 writelatencyms:0.2
07/20/2009 10:43:38 PM - SPINE: Poller[0] Host[188] DS[5704] SS[7] SERVER: /usr/share/cacti/scripts/ss_eva.php ss_eva <censored>.100 FP1 b, output: controller:b portname:FP1 readreqps:0 readmbps:0.00 readlatencyms:0.0 writereqps:0 writembps:0.00 writelatencyms:0.0
- gandalf
- Developer
- Posts: 22383
- Joined: Thu Dec 02, 2004 2:46 am
- Location: Muenster, Germany
- Contact:
It's not a workaround but the recommended solution.taliz wrote:FWIW, I'm looking into using snmptools to poll the SAN appliance server with snmp instead of check_nrpe. If that works it should be a lot faster. It doesn't solve the parallel processing problem, but it would be a workaround.
The "parallel processing" thingy may be a valid issue, but nrpe will be magnitudes slower compared to spine/snmp processing
As a workaround, you may want to pay attention to nrpe script timeout
Reinhard
Since I'm the king of evil hacks....
Have you considered running the check_npre as a cron, writing the output, and using a script to grab that data? Ugly evil hack, but occasionally useful....
Have you considered running the check_npre as a cron, writing the output, and using a script to grab that data? Ugly evil hack, but occasionally useful....
--
Live fast, die young
You're sucking up my bandwidth.
J.P. Pasnak,CD
CCNA, LPIC-1
http://www.warpedsystems.sk.ca
Live fast, die young
You're sucking up my bandwidth.
J.P. Pasnak,CD
CCNA, LPIC-1
http://www.warpedsystems.sk.ca
Who is online
Users browsing this forum: No registered users and 7 guests