Monitor Windows via WMI from Cacti on Linux

Templates, scripts for templates, scripts and requests for templates.

Moderators: Developers, Moderators

Post Reply
hapklaar
Posts: 38
Joined: Tue May 31, 2005 10:06 am

Post by hapklaar »

First of all kudo's to claymen for this great script. It extends the usability of Cacti so much. Thanks!

I managed to create some graphs for monitoring the read and write latencies on our HP EVA. The only problem is that my graphs are full of gaps. The cacti log shows for an unsuccesfull poll:

Code: Select all

05/25/2009 10:00:57 PM - CMDPHP: Poller[0] Host[266] DS[4867] CMD: /usr/bin/php -q /usr/share/cacti/site/scripts/wmi.php -h 'andpm01' -u '/etc/cacti/auth.txt' -w 'Win32_PerfFormattedData_EVAPMEXT_HPEVAPhysicalDiskGroup' -n '' -k 'Name' -v 'ANEVA101 - DiskGroup 2' -c 'ReadLatencyus,WriteLatencyus', output: U
and for a succesfull poll of the exact same datasource:

Code: Select all

05/25/2009 09:55:52 PM - CMDPHP: Poller[0] Host[266] DS[4867] CMD: /usr/bin/php -q /usr/share/cacti/site/scripts/wmi.php -h 'andpm01' -u '/etc/cacti/auth.txt' -w 'Win32_PerfFormattedData_EVAPMEXT_HPEVAPhysicalDiskGroup' -n '' -k 'Name' -v 'ANEVA101 - DiskGroup 2' -c 'ReadLatencyus,WriteLatencyus', output: Name:ANEVA101_-_DiskGroup_2 ReadLatencyus:13735 WriteLatencyus:12733
Why do some polls return an invalid output? Could this be a timeout issue?

EDIT: added a graph to show the problem:
Attachments
graph_image.php.png
graph_image.php.png (34.11 KiB) Viewed 4718 times
User avatar
claymen
Cacti User
Posts: 259
Joined: Mon Aug 18, 2008 4:30 am
Location: Australia
Contact:

Post by claymen »

Could be but hard to know from the cacti logs you posted.

Setup the log file path and enable debug level 2. This will write out a heap of details about each run. Hopefully it will help you pinpoint whats causing the problem
hapklaar
Posts: 38
Joined: Tue May 31, 2005 10:06 am

Post by hapklaar »

Not sure what to look for here. After setting the loglevel to debug, the line where the result should be is still the same with no reason for the 'U' as far as I can see.

Code: Select all

05/26/2009 09:30:48 AM - CMDPHP: Poller[0] Host[266] DS[4868] WARNING: Result from CMD not valid.  Partial Result: U
05/26/2009 09:30:48 AM - CMDPHP: Poller[0] Host[266] DS[4868] CMD: /usr/bin/php -q /usr/share/cacti/site/scripts/wmi.php -h 'andpm01' -u '/etc/cacti/auth.txt' -w 'Win32_PerfFormattedData_EVAPMEXT_HPEVAPhysicalDiskGroup' -n '' -k 'Name' -v 'ANEVA101 - DiskGroup 3' -c 'ReadLatencyus,WriteLatencyus', output: U
Or were you not referring to the cacti log?
User avatar
claymen
Cacti User
Posts: 259
Joined: Mon Aug 18, 2008 4:30 am
Location: Australia
Contact:

Post by claymen »

No not the cacti log but the actual wmi.php logs which you enable by setting debug level 2. You will find a stack of log files in the path you specified.
hapklaar
Posts: 38
Joined: Tue May 31, 2005 10:06 am

Post by hapklaar »

Only one file is created there, for the only collection that is succesful at the moment. No debug files are generated for the datasources that give "output: U".

EDIT: actually all datasources from this host stopped working, getting the following error: NTSTATUS: NT code 0xc002001b - NT code 0xc002001b
hapklaar
Posts: 38
Joined: Tue May 31, 2005 10:06 am

Post by hapklaar »

Ok had to reboot the windows host, it all of a sudden refused to answer to any WMI requests.

Now it's working again and for every missing datapoint in cacti, there is no entry in the wmi logging.

When I manually run the command in quick succession I sometimes get the error "NTSTATUS: NT code 0xc00706be - NT code 0xc00706be" or "NTSTATUS: NT code 0xc00706ba - NT code 0xc00706ba"
Thomas.Pacce
Posts: 23
Joined: Wed Apr 15, 2009 5:19 am
Location: Amsterdam

Post by Thomas.Pacce »

Why do i get broken graphs like the one attached?

I have setup this graph for a number of host, some display just correctly whereas others are messed up.
Attachments
ScreenShot001.jpg
ScreenShot001.jpg (37.78 KiB) Viewed 4570 times
User avatar
claymen
Cacti User
Posts: 259
Joined: Mon Aug 18, 2008 4:30 am
Location: Australia
Contact:

Post by claymen »

hapklaar wrote:Ok had to reboot the windows host, it all of a sudden refused to answer to any WMI requests.

Now it's working again and for every missing datapoint in cacti, there is no entry in the wmi logging.

When I manually run the command in quick succession I sometimes get the error "NTSTATUS: NT code 0xc00706be - NT code 0xc00706be" or "NTSTATUS: NT code 0xc00706ba - NT code 0xc00706ba"
From memory isn't that RPC server unavailable?
User avatar
claymen
Cacti User
Posts: 259
Joined: Mon Aug 18, 2008 4:30 am
Location: Australia
Contact:

Post by claymen »

Thomas.Pacce wrote:Why do i get broken graphs like the one attached?

I have setup this graph for a number of host, some display just correctly whereas others are messed up.
Not sure mate, looks like its not getting results back properly, again if you setup the level 2 debugging to dump out the logs of whats going on it might give you a better idea.

Debug level 2 dumps out a heap of info, all the inputs, the direct output, the exact command being run basically everything you need to know to see whats going on.
hapklaar
Posts: 38
Joined: Tue May 31, 2005 10:06 am

Post by hapklaar »

claymen wrote:
hapklaar wrote:Ok had to reboot the windows host, it all of a sudden refused to answer to any WMI requests.

Now it's working again and for every missing datapoint in cacti, there is no entry in the wmi logging.

When I manually run the command in quick succession I sometimes get the error "NTSTATUS: NT code 0xc00706be - NT code 0xc00706be" or "NTSTATUS: NT code 0xc00706ba - NT code 0xc00706ba"
From memory isn't that RPC server unavailable?
It looks like it. However if I try again right after that, I do get a result. Thsi could be causing thomas his problem also. It occurs on multiple win2k3 hosts by the way. Would it be possible to include a retry in the script? Or do you know what might cause this?

PS Once every day the WMI service seems to crash since I've been monitoring our EVA's using this script. Only killing the process and starting the service gets it back on track
User avatar
claymen
Cacti User
Posts: 259
Joined: Mon Aug 18, 2008 4:30 am
Location: Australia
Contact:

Post by claymen »

hapklaar wrote:
claymen wrote:
hapklaar wrote:Ok had to reboot the windows host, it all of a sudden refused to answer to any WMI requests.

Now it's working again and for every missing datapoint in cacti, there is no entry in the wmi logging.

When I manually run the command in quick succession I sometimes get the error "NTSTATUS: NT code 0xc00706be - NT code 0xc00706be" or "NTSTATUS: NT code 0xc00706ba - NT code 0xc00706ba"
From memory isn't that RPC server unavailable?
It looks like it. However if I try again right after that, I do get a result. Thsi could be causing thomas his problem also. It occurs on multiple win2k3 hosts by the way. Would it be possible to include a retry in the script? Or do you know what might cause this?

PS Once every day the WMI service seems to crash since I've been monitoring our EVA's using this script. Only killing the process and starting the service gets it back on track
Adding a retry has the potential to blow out your poll time. Remember you have to get every result in under 5 minutes (by default) and adding a retry means every data source using the script has a potential of doubling its time to run (or more depending on number of tries).

I'm not saying you can't but there is a potential for it to cause other problems with your poller.
hapklaar
Posts: 38
Joined: Tue May 31, 2005 10:06 am

Post by hapklaar »

Ok, but that's not really an issue for me as currently a total poll takes a little over 60 seconds. And that's with cmd.php.

I really can't figure out why that error pops up that often, was hoping you might...
User avatar
claymen
Cacti User
Posts: 259
Joined: Mon Aug 18, 2008 4:30 am
Location: Australia
Contact:

Post by claymen »

hapklaar wrote:Ok, but that's not really an issue for me as currently a total poll takes a little over 60 seconds. And that's with cmd.php.

I really can't figure out why that error pops up that often, was hoping you might...
Wow 60 seconds seems high.

We have the following and get a poll time of about 30seconds
10k+ data sources
500+ wmi data sources
450 hosts total
hapklaar
Posts: 38
Joined: Tue May 31, 2005 10:06 am

Post by hapklaar »

claymen wrote:
hapklaar wrote:Ok, but that's not really an issue for me as currently a total poll takes a little over 60 seconds. And that's with cmd.php.

I really can't figure out why that error pops up that often, was hoping you might...
Wow 60 seconds seems high.

We have the following and get a poll time of about 30seconds
10k+ data sources
500+ wmi data sources
450 hosts total
As long as it's under 300 secs, there's no problem :wink:

Maybe you are using spine or another fast poller? Or maybe most of your hosts are local. Most of my 8k data sources are on remote (slow) sites.
User avatar
claymen
Cacti User
Posts: 259
Joined: Mon Aug 18, 2008 4:30 am
Location: Australia
Contact:

Post by claymen »

hapklaar wrote:
claymen wrote:
hapklaar wrote:Ok, but that's not really an issue for me as currently a total poll takes a little over 60 seconds. And that's with cmd.php.

I really can't figure out why that error pops up that often, was hoping you might...
Wow 60 seconds seems high.

We have the following and get a poll time of about 30seconds
10k+ data sources
500+ wmi data sources
450 hosts total
As long as it's under 300 secs, there's no problem :wink:

Maybe you are using spine or another fast poller? Or maybe most of your hosts are local. Most of my 8k data sources are on remote (slow) sites.
True so long as its under 300 it works, but ideally the faster the better ;) The more head room you have the better you can cope with spikes in poll time.

Our data sources span the entire country but we are an ISP so the links between them all are nice and fast. If they were slow our customers wouldn't be too impressed...
Post Reply

Who is online

Users browsing this forum: No registered users and 5 guests