I've inherited maintenance of a Cacti server. Version is 0.8.8a running on Windows 2008 R2 64-bit. Cygwin is installed.
Sometimes performance on the server grinds to a halt. When this happens, process monitor shows multiple (20 or more, sometimes) spine.exe processes with the command line "0 21". I've been unable to determine what exactly those parameters mean so I can troubleshoot further.
Also, there is frequently a "sh.exe" process that is hung running one script or another. Most recently, this was "sh.exe -c "C:/php/php.exe -q C:/inetpub/wwwroot/cacti/scripts/query_cisco4400_wap_cnt.php 10.0.250.4 [password] 2"
Finally, the cacti log is littered with messages like this:
CMDPHP: Poller[0] ERROR: SQL Assoc Failed!, Error:'1064', SQL:"select field_name,field_value from host_snmp_cache where host_id= and snmp_query_id= and snmp_index=''"
Obviously, that's not valid SQL since there is no value specified for host_id or snmp_query
So my questions would be:
1) How do I troubleshoot the hung spine.exe processes and what do the parameters mean?
2) How do I troubleshoot the CPU-intensive sh.exe process?
3) How do I correct the MySQL errors?
I'm attaching the technical support info from our Cacti installation.
Thanks,
Bryan
Hanging spine.exe and sh.exe processes
Moderators: Developers, Moderators
Re: Hanging spine.exe and sh.exe processes
Spine.exe 0 21 tells it the range of hostIDs it should poll. You have 22 devices, right?
Sounds like query_cisco4400_wap_cnt.php is a poorly written script which does not contain timeout logic to self-terminate. Cacti can wait indefinitely for rouge scripts to finish, which can cause the poller to run too long.
MySQL error 1064 is error in syntax. Looks like something created an invalid query as there are no numbers specified for those ID fields. Probably a plugin. Disable plugins one by one until the problem stops. check if you're running the latest plugin versions.
Code: Select all
C:\Spine>spine -h
SPINE 0.8.8a Copyright 2002-2011 by The Cacti Group
Usage: spine [options] [[firstid lastid] || [-H/--hostlist='hostid1,hostid2,...,hostidn']]
MySQL error 1064 is error in syntax. Looks like something created an invalid query as there are no numbers specified for those ID fields. Probably a plugin. Disable plugins one by one until the problem stops. check if you're running the latest plugin versions.
| Scripts: Monitor processes | RFC1213 MIB | DOCSIS Stats | Dell PowerEdge | Speedfan | APC UPS | DOCSIS CMTS | 3ware | Motorola Canopy |
| Guides: Windows Install | [HOWTO] Debug Windows NTFS permission problems |
| Tools: Windows All-in-one Installer |
Re: Hanging spine.exe and sh.exe processes
Thanks for getting back to me so fast.
A question: in a lot of threads about "hung scripts" I see answers, very similar to what you said, about the script not being "well-written".
Are there examples anywhere of "well-written" scripts that i can compare the failing script against to see if it can be improved? I'm a programmer by trade and I try to solve practically every problem with software tweaks
A question: in a lot of threads about "hung scripts" I see answers, very similar to what you said, about the script not being "well-written".
Are there examples anywhere of "well-written" scripts that i can compare the failing script against to see if it can be improved? I'm a programmer by trade and I try to solve practically every problem with software tweaks
Re: Hanging spine.exe and sh.exe processes
Smart man
Well it really depends on what language your script is using... but basically the queries need to include a [reasonable] timeout so they don't last forever.
Taking a quick look at the query_cisco4400_wap_cnt.php script, it at least uses the cacti_snmp_walk function but hardcodes a 2 second timeout per query. Also looks to be recursive for some reason. Not being familiar with the cisco4400_wap, I'd hope there is a more efficient method to obtain the count.... time for you to dig a bit
Well it really depends on what language your script is using... but basically the queries need to include a [reasonable] timeout so they don't last forever.
Taking a quick look at the query_cisco4400_wap_cnt.php script, it at least uses the cacti_snmp_walk function but hardcodes a 2 second timeout per query. Also looks to be recursive for some reason. Not being familiar with the cisco4400_wap, I'd hope there is a more efficient method to obtain the count.... time for you to dig a bit
| Scripts: Monitor processes | RFC1213 MIB | DOCSIS Stats | Dell PowerEdge | Speedfan | APC UPS | DOCSIS CMTS | 3ware | Motorola Canopy |
| Guides: Windows Install | [HOWTO] Debug Windows NTFS permission problems |
| Tools: Windows All-in-one Installer |
Re: Hanging spine.exe and sh.exe processes
One thing that puzzles me: When the PHP script gets hung up, all the CPU is being taken by sh.exe, not php.exe. You would think if PHP was running the bad script, that the php-cgi process (or whatever it's called) would be the one with the high CPU. But instead, it's sh.exe...which should only really be in charge of invoking PHP.
Re: Hanging spine.exe and sh.exe processes
Is your cygwin up to date? sh.exe is being used to launch the php script? whats the commandline? could be the script author is doing something non-standard with how is being invoked. look at the data source for it and see how it compares to others scripts.
| Scripts: Monitor processes | RFC1213 MIB | DOCSIS Stats | Dell PowerEdge | Speedfan | APC UPS | DOCSIS CMTS | 3ware | Motorola Canopy |
| Guides: Windows Install | [HOWTO] Debug Windows NTFS permission problems |
| Tools: Windows All-in-one Installer |
Who is online
Users browsing this forum: No registered users and 0 guests