Ad blocker detected: Our website is made possible by displaying online advertisements to our visitors. Please consider supporting us by disabling your ad blocker on our website.
First of all kudo's to claymen for this great script. It extends the usability of Cacti so much. Thanks!
I managed to create some graphs for monitoring the read and write latencies on our HP EVA. The only problem is that my graphs are full of gaps. The cacti log shows for an unsuccesfull poll:
Could be but hard to know from the cacti logs you posted.
Setup the log file path and enable debug level 2. This will write out a heap of details about each run. Hopefully it will help you pinpoint whats causing the problem
Not sure what to look for here. After setting the loglevel to debug, the line where the result should be is still the same with no reason for the 'U' as far as I can see.
No not the cacti log but the actual wmi.php logs which you enable by setting debug level 2. You will find a stack of log files in the path you specified.
Only one file is created there, for the only collection that is succesful at the moment. No debug files are generated for the datasources that give "output: U".
EDIT: actually all datasources from this host stopped working, getting the following error: NTSTATUS: NT code 0xc002001b - NT code 0xc002001b
Ok had to reboot the windows host, it all of a sudden refused to answer to any WMI requests.
Now it's working again and for every missing datapoint in cacti, there is no entry in the wmi logging.
When I manually run the command in quick succession I sometimes get the error "NTSTATUS: NT code 0xc00706be - NT code 0xc00706be" or "NTSTATUS: NT code 0xc00706ba - NT code 0xc00706ba"
hapklaar wrote:Ok had to reboot the windows host, it all of a sudden refused to answer to any WMI requests.
Now it's working again and for every missing datapoint in cacti, there is no entry in the wmi logging.
When I manually run the command in quick succession I sometimes get the error "NTSTATUS: NT code 0xc00706be - NT code 0xc00706be" or "NTSTATUS: NT code 0xc00706ba - NT code 0xc00706ba"
Thomas.Pacce wrote:Why do i get broken graphs like the one attached?
I have setup this graph for a number of host, some display just correctly whereas others are messed up.
Not sure mate, looks like its not getting results back properly, again if you setup the level 2 debugging to dump out the logs of whats going on it might give you a better idea.
Debug level 2 dumps out a heap of info, all the inputs, the direct output, the exact command being run basically everything you need to know to see whats going on.
hapklaar wrote:Ok had to reboot the windows host, it all of a sudden refused to answer to any WMI requests.
Now it's working again and for every missing datapoint in cacti, there is no entry in the wmi logging.
When I manually run the command in quick succession I sometimes get the error "NTSTATUS: NT code 0xc00706be - NT code 0xc00706be" or "NTSTATUS: NT code 0xc00706ba - NT code 0xc00706ba"
From memory isn't that RPC server unavailable?
It looks like it. However if I try again right after that, I do get a result. Thsi could be causing thomas his problem also. It occurs on multiple win2k3 hosts by the way. Would it be possible to include a retry in the script? Or do you know what might cause this?
PS Once every day the WMI service seems to crash since I've been monitoring our EVA's using this script. Only killing the process and starting the service gets it back on track
hapklaar wrote:Ok had to reboot the windows host, it all of a sudden refused to answer to any WMI requests.
Now it's working again and for every missing datapoint in cacti, there is no entry in the wmi logging.
When I manually run the command in quick succession I sometimes get the error "NTSTATUS: NT code 0xc00706be - NT code 0xc00706be" or "NTSTATUS: NT code 0xc00706ba - NT code 0xc00706ba"
From memory isn't that RPC server unavailable?
It looks like it. However if I try again right after that, I do get a result. Thsi could be causing thomas his problem also. It occurs on multiple win2k3 hosts by the way. Would it be possible to include a retry in the script? Or do you know what might cause this?
PS Once every day the WMI service seems to crash since I've been monitoring our EVA's using this script. Only killing the process and starting the service gets it back on track
Adding a retry has the potential to blow out your poll time. Remember you have to get every result in under 5 minutes (by default) and adding a retry means every data source using the script has a potential of doubling its time to run (or more depending on number of tries).
I'm not saying you can't but there is a potential for it to cause other problems with your poller.
hapklaar wrote:Ok, but that's not really an issue for me as currently a total poll takes a little over 60 seconds. And that's with cmd.php.
I really can't figure out why that error pops up that often, was hoping you might...
Wow 60 seconds seems high.
We have the following and get a poll time of about 30seconds
10k+ data sources
500+ wmi data sources
450 hosts total
As long as it's under 300 secs, there's no problem
Maybe you are using spine or another fast poller? Or maybe most of your hosts are local. Most of my 8k data sources are on remote (slow) sites.
True so long as its under 300 it works, but ideally the faster the better The more head room you have the better you can cope with spikes in poll time.
Our data sources span the entire country but we are an ISP so the links between them all are nice and fast. If they were slow our customers wouldn't be too impressed...