Issues with Poller times

Post support questions that relate to the Windows 2003/2000/XP operating systems.

Moderators: Developers, Moderators

Post Reply
crono771
Posts: 10
Joined: Mon Jan 31, 2011 5:18 am

Issues with Poller times

Post by crono771 »

Hi,

We have been using Cacti for about 6 months on Windows. We monitor and graph 258 Windows 2003 servers using WMI queries mainly. Our poller usually usually completes fine in between 160 - 180 seconds which is fine. The strange thing is i have come in a couple of mornings and Cacti has not really been graphing all night due to the poller over running. No idea what is causing it to suddenly go from 160 seconds to over 300. Once it goes over once it doesnt recover untill i kill all spine tasks and sh tasks and the the process start again then all returns to normal.

Hope someone can give me some help and guidance on this or maybe a some kind of work around to kill all processes when it goes mad

Thanks

My poller settings are attached

Host Information

Cacti Version 0.8.7g
Plugin Architecture 2.8
Poller Type Spine 0.8.7g
Server Info Windows NT 5.2
Web Server Microsoft-IIS/6.0
PHP Version 5.2.14
PHP Extensions bcmath, calendar, com_dotnet, ctype, date, filter, ftp, hash, iconv, json, odbc, pcre, Reflection, session, libxml, standard, tokenizer, zlib, SimpleXML, dom, SPL, wddx, xml, xmlreader, xmlwriter, ISAPI, ldap, mysql, snmp, sockets
MySQL Version 5.1.51-community
RRDTool Version 1.2.30
Plugins
Global Plugin Settings (settings - v0.5)
Update Checker (update - v0.4)
Host Info (hostinfo - v0.2)
Login Page Mod (loginmod - v1.0
Attachments
Poller settings
Poller settings
Poller.png (65.12 KiB) Viewed 1679 times
User avatar
BSOD2600
Cacti Moderator
Posts: 12171
Joined: Sat May 08, 2004 12:44 pm
Location: USA

Re: Issues with Poller times

Post by BSOD2600 »

Most likely a script is not properly terminating, which is causing spine to hang. The next time there is a hung cacti related process, use sysinternals process explorer to find out what command line arguments the process was started with. That should give you an idea where to start to look.

also, with cacti 0.8.7g, you should be using pia 2.9, as it contains all the latest patches.
crono771
Posts: 10
Joined: Mon Jan 31, 2011 5:18 am

Re: Issues with Poller times

Post by crono771 »

Thanks for the advice!

I came in this morning and it was frozen again. There were 3 sh.exe processes which were running a perl script to check memory. Looks like this script may be the issue. I will await the next issue to ensure its just this perl script that freezes.

I'll keep you posted.

Also once i resolve the problem i will look to update PIA

Cheers
crono771
Posts: 10
Joined: Mon Jan 31, 2011 5:18 am

Re: Issues with Poller times

Post by crono771 »

Sadly this morning had 8 hung processes (sh.exe) and they were a mixture of all my scripts. cscript and perl.

So in short what you sugested about it being a script not terminating still may be the case but it is not a particular script, it seems to be any of them.

Any sugestions where to go from here?

Thanks
User avatar
BSOD2600
Cacti Moderator
Posts: 12171
Joined: Sat May 08, 2004 12:44 pm
Location: USA

Re: Issues with Poller times

Post by BSOD2600 »

Were the scripts hung for a specific host?
Do the scripts have timeout logic built into them?
If you switch to cmd.php, does the problem go away?
crono771
Posts: 10
Joined: Mon Jan 31, 2011 5:18 am

Re: Issues with Poller times

Post by crono771 »

Hi, Thanks for coming back to me.

Sadly no, it's mutiple hosts.

Again no. This may be something i should do. I have attached a script so you can see. they are very simple vb (which to be fair is also my vb skill level :D )

Tried this already and cmd.php cant finish all hosts in 300 seconds so i geuss i cant answer the question.

Cheers
Attachments
w32_disk_stats.vbs.txt
(968 Bytes) Downloaded 70 times
crono771
Posts: 10
Joined: Mon Jan 31, 2011 5:18 am

Re: Issues with Poller times

Post by crono771 »

I should probably have mentioned this earlier but think it may be a seperate issue.

During each poller cycle i get between 2-4 "ERROR: The POPEN timed out". Its never the same hosts.

If its not related thats cool, i just started thinking could it be one of these "time outs" that does not let go of a script every now and then???

Cheers
User avatar
BSOD2600
Cacti Moderator
Posts: 12171
Joined: Sat May 08, 2004 12:44 pm
Location: USA

Re: Issues with Poller times

Post by BSOD2600 »

ah, well as those scripts are doing WMI calls AND I see you have is pause for 2 seconds... this makes for a very laggy and problematic script. Relying on Cacti to kill off the scripts when the polling cycle ends doesn't work well, as you've found out. I seem to recall that you can implement a timeout for WMI calls in vbscript (the microsoft script resource center is a good place to look).

The spine popen issues I'm not familiar with. TheWitness will know more details about that (or search).
crono771
Posts: 10
Joined: Mon Jan 31, 2011 5:18 am

Re: Issues with Poller times

Post by crono771 »

Ok, I have changed all my data input methods to include a timeout option. I'll see how it goes over the weekend

c:/windows/system32/cscript.exe //nologo //T:20 <path_cacti>/scripts/w32_disk_stats.vbs <hostname> <disk>

If this has no affect i can add a timeout in the script if you think that would be better

Thanks
User avatar
BSOD2600
Cacti Moderator
Posts: 12171
Joined: Sat May 08, 2004 12:44 pm
Location: USA

Re: Issues with Poller times

Post by BSOD2600 »

Using the cscript timeout seems completely logical to me.
crono771
Posts: 10
Joined: Mon Jan 31, 2011 5:18 am

Re: Issues with Poller times

Post by crono771 »

Still no joy. Gutted.

cscript timeout didnt help. still got the odd sh.exe hanging around taking up CPU then another and another till crash....

Tried a timeout in the VB script and again no joy.

To work around my issue i have got the attached script running as ascheduled task every 5 mins. It checks the PID of every sh.exe running and if the same PID is still running on the next pass it kills it. Working fine with this but i'm not happy with this workaround

oh well

Got any other ideas or shall i just stick with my crappy workaround?

Thanks for you help
Attachments
PID_KILL.bat.txt
(389 Bytes) Downloaded 226 times
User avatar
BSOD2600
Cacti Moderator
Posts: 12171
Joined: Sat May 08, 2004 12:44 pm
Location: USA

Re: Issues with Poller times

Post by BSOD2600 »

Years ago, I too was running into this problem of scripts never terminating. came up with a batch script which would run sysinternals pskill every hour or so. Then I installed uphclean and it helped a lot. You might give it a try.

When you examine the running processes with sysinternals process explorer, what are they doing? The call stack (you need to configure the symbols) show anything useful? Is there a pattern for which hosts / scripts are constantly hanging?
Post Reply

Who is online

Users browsing this forum: No registered users and 2 guests