Issues with Poller times
Moderators: Developers, Moderators
Issues with Poller times
Hi,
We have been using Cacti for about 6 months on Windows. We monitor and graph 258 Windows 2003 servers using WMI queries mainly. Our poller usually usually completes fine in between 160 - 180 seconds which is fine. The strange thing is i have come in a couple of mornings and Cacti has not really been graphing all night due to the poller over running. No idea what is causing it to suddenly go from 160 seconds to over 300. Once it goes over once it doesnt recover untill i kill all spine tasks and sh tasks and the the process start again then all returns to normal.
Hope someone can give me some help and guidance on this or maybe a some kind of work around to kill all processes when it goes mad
Thanks
My poller settings are attached
Host Information
Cacti Version 0.8.7g
Plugin Architecture 2.8
Poller Type Spine 0.8.7g
Server Info Windows NT 5.2
Web Server Microsoft-IIS/6.0
PHP Version 5.2.14
PHP Extensions bcmath, calendar, com_dotnet, ctype, date, filter, ftp, hash, iconv, json, odbc, pcre, Reflection, session, libxml, standard, tokenizer, zlib, SimpleXML, dom, SPL, wddx, xml, xmlreader, xmlwriter, ISAPI, ldap, mysql, snmp, sockets
MySQL Version 5.1.51-community
RRDTool Version 1.2.30
Plugins
Global Plugin Settings (settings - v0.5)
Update Checker (update - v0.4)
Host Info (hostinfo - v0.2)
Login Page Mod (loginmod - v1.0
We have been using Cacti for about 6 months on Windows. We monitor and graph 258 Windows 2003 servers using WMI queries mainly. Our poller usually usually completes fine in between 160 - 180 seconds which is fine. The strange thing is i have come in a couple of mornings and Cacti has not really been graphing all night due to the poller over running. No idea what is causing it to suddenly go from 160 seconds to over 300. Once it goes over once it doesnt recover untill i kill all spine tasks and sh tasks and the the process start again then all returns to normal.
Hope someone can give me some help and guidance on this or maybe a some kind of work around to kill all processes when it goes mad
Thanks
My poller settings are attached
Host Information
Cacti Version 0.8.7g
Plugin Architecture 2.8
Poller Type Spine 0.8.7g
Server Info Windows NT 5.2
Web Server Microsoft-IIS/6.0
PHP Version 5.2.14
PHP Extensions bcmath, calendar, com_dotnet, ctype, date, filter, ftp, hash, iconv, json, odbc, pcre, Reflection, session, libxml, standard, tokenizer, zlib, SimpleXML, dom, SPL, wddx, xml, xmlreader, xmlwriter, ISAPI, ldap, mysql, snmp, sockets
MySQL Version 5.1.51-community
RRDTool Version 1.2.30
Plugins
Global Plugin Settings (settings - v0.5)
Update Checker (update - v0.4)
Host Info (hostinfo - v0.2)
Login Page Mod (loginmod - v1.0
- Attachments
-
- Poller settings
- Poller.png (65.12 KiB) Viewed 1679 times
Re: Issues with Poller times
Most likely a script is not properly terminating, which is causing spine to hang. The next time there is a hung cacti related process, use sysinternals process explorer to find out what command line arguments the process was started with. That should give you an idea where to start to look.
also, with cacti 0.8.7g, you should be using pia 2.9, as it contains all the latest patches.
also, with cacti 0.8.7g, you should be using pia 2.9, as it contains all the latest patches.
| Scripts: Monitor processes | RFC1213 MIB | DOCSIS Stats | Dell PowerEdge | Speedfan | APC UPS | DOCSIS CMTS | 3ware | Motorola Canopy |
| Guides: Windows Install | [HOWTO] Debug Windows NTFS permission problems |
| Tools: Windows All-in-one Installer |
Re: Issues with Poller times
Thanks for the advice!
I came in this morning and it was frozen again. There were 3 sh.exe processes which were running a perl script to check memory. Looks like this script may be the issue. I will await the next issue to ensure its just this perl script that freezes.
I'll keep you posted.
Also once i resolve the problem i will look to update PIA
Cheers
I came in this morning and it was frozen again. There were 3 sh.exe processes which were running a perl script to check memory. Looks like this script may be the issue. I will await the next issue to ensure its just this perl script that freezes.
I'll keep you posted.
Also once i resolve the problem i will look to update PIA
Cheers
Re: Issues with Poller times
Sadly this morning had 8 hung processes (sh.exe) and they were a mixture of all my scripts. cscript and perl.
So in short what you sugested about it being a script not terminating still may be the case but it is not a particular script, it seems to be any of them.
Any sugestions where to go from here?
Thanks
So in short what you sugested about it being a script not terminating still may be the case but it is not a particular script, it seems to be any of them.
Any sugestions where to go from here?
Thanks
Re: Issues with Poller times
Were the scripts hung for a specific host?
Do the scripts have timeout logic built into them?
If you switch to cmd.php, does the problem go away?
Do the scripts have timeout logic built into them?
If you switch to cmd.php, does the problem go away?
| Scripts: Monitor processes | RFC1213 MIB | DOCSIS Stats | Dell PowerEdge | Speedfan | APC UPS | DOCSIS CMTS | 3ware | Motorola Canopy |
| Guides: Windows Install | [HOWTO] Debug Windows NTFS permission problems |
| Tools: Windows All-in-one Installer |
Re: Issues with Poller times
Hi, Thanks for coming back to me.
Sadly no, it's mutiple hosts.
Again no. This may be something i should do. I have attached a script so you can see. they are very simple vb (which to be fair is also my vb skill level )
Tried this already and cmd.php cant finish all hosts in 300 seconds so i geuss i cant answer the question.
Cheers
Sadly no, it's mutiple hosts.
Again no. This may be something i should do. I have attached a script so you can see. they are very simple vb (which to be fair is also my vb skill level )
Tried this already and cmd.php cant finish all hosts in 300 seconds so i geuss i cant answer the question.
Cheers
- Attachments
-
- w32_disk_stats.vbs.txt
- (968 Bytes) Downloaded 70 times
Re: Issues with Poller times
I should probably have mentioned this earlier but think it may be a seperate issue.
During each poller cycle i get between 2-4 "ERROR: The POPEN timed out". Its never the same hosts.
If its not related thats cool, i just started thinking could it be one of these "time outs" that does not let go of a script every now and then???
Cheers
During each poller cycle i get between 2-4 "ERROR: The POPEN timed out". Its never the same hosts.
If its not related thats cool, i just started thinking could it be one of these "time outs" that does not let go of a script every now and then???
Cheers
Re: Issues with Poller times
ah, well as those scripts are doing WMI calls AND I see you have is pause for 2 seconds... this makes for a very laggy and problematic script. Relying on Cacti to kill off the scripts when the polling cycle ends doesn't work well, as you've found out. I seem to recall that you can implement a timeout for WMI calls in vbscript (the microsoft script resource center is a good place to look).
The spine popen issues I'm not familiar with. TheWitness will know more details about that (or search).
The spine popen issues I'm not familiar with. TheWitness will know more details about that (or search).
| Scripts: Monitor processes | RFC1213 MIB | DOCSIS Stats | Dell PowerEdge | Speedfan | APC UPS | DOCSIS CMTS | 3ware | Motorola Canopy |
| Guides: Windows Install | [HOWTO] Debug Windows NTFS permission problems |
| Tools: Windows All-in-one Installer |
Re: Issues with Poller times
Ok, I have changed all my data input methods to include a timeout option. I'll see how it goes over the weekend
c:/windows/system32/cscript.exe //nologo //T:20 <path_cacti>/scripts/w32_disk_stats.vbs <hostname> <disk>
If this has no affect i can add a timeout in the script if you think that would be better
Thanks
c:/windows/system32/cscript.exe //nologo //T:20 <path_cacti>/scripts/w32_disk_stats.vbs <hostname> <disk>
If this has no affect i can add a timeout in the script if you think that would be better
Thanks
Re: Issues with Poller times
Using the cscript timeout seems completely logical to me.
| Scripts: Monitor processes | RFC1213 MIB | DOCSIS Stats | Dell PowerEdge | Speedfan | APC UPS | DOCSIS CMTS | 3ware | Motorola Canopy |
| Guides: Windows Install | [HOWTO] Debug Windows NTFS permission problems |
| Tools: Windows All-in-one Installer |
Re: Issues with Poller times
Still no joy. Gutted.
cscript timeout didnt help. still got the odd sh.exe hanging around taking up CPU then another and another till crash....
Tried a timeout in the VB script and again no joy.
To work around my issue i have got the attached script running as ascheduled task every 5 mins. It checks the PID of every sh.exe running and if the same PID is still running on the next pass it kills it. Working fine with this but i'm not happy with this workaround
oh well
Got any other ideas or shall i just stick with my crappy workaround?
Thanks for you help
cscript timeout didnt help. still got the odd sh.exe hanging around taking up CPU then another and another till crash....
Tried a timeout in the VB script and again no joy.
To work around my issue i have got the attached script running as ascheduled task every 5 mins. It checks the PID of every sh.exe running and if the same PID is still running on the next pass it kills it. Working fine with this but i'm not happy with this workaround
oh well
Got any other ideas or shall i just stick with my crappy workaround?
Thanks for you help
- Attachments
-
- PID_KILL.bat.txt
- (389 Bytes) Downloaded 226 times
Re: Issues with Poller times
Years ago, I too was running into this problem of scripts never terminating. came up with a batch script which would run sysinternals pskill every hour or so. Then I installed uphclean and it helped a lot. You might give it a try.
When you examine the running processes with sysinternals process explorer, what are they doing? The call stack (you need to configure the symbols) show anything useful? Is there a pattern for which hosts / scripts are constantly hanging?
When you examine the running processes with sysinternals process explorer, what are they doing? The call stack (you need to configure the symbols) show anything useful? Is there a pattern for which hosts / scripts are constantly hanging?
| Scripts: Monitor processes | RFC1213 MIB | DOCSIS Stats | Dell PowerEdge | Speedfan | APC UPS | DOCSIS CMTS | 3ware | Motorola Canopy |
| Guides: Windows Install | [HOWTO] Debug Windows NTFS permission problems |
| Tools: Windows All-in-one Installer |
Who is online
Users browsing this forum: No registered users and 2 guests