Windows performance counters & VBS/WMI via SNMP

Post support questions that relate to the Windows 2003/2000/XP operating systems.

Moderators: Developers, Moderators

Post Reply
flukester
Posts: 17
Joined: Fri Jan 16, 2009 3:01 pm
Location: Montreal, QC, Canada

Post by flukester »

flukester wrote:Hello!

I tried 2.0.0.14 (according to the history.txt file in the zip) and am still having some issues during snmpwalks.
Actually, the new version is working fine now! :)

Looks like I had some corruption with my counters.ini (being transferred over a CIFS share over a VPN) and only the first 64kbytes of the file had valid data and the rest was padded with zero values up to the proper file length..

Now that I think about it, it might have been the problem for some time and I may have had you debugging a non-existent problem with your code...

sorry about that :(


I'll be adding a few hundred more counters today, I'll let you know once I'm done to let you know if everything is working as intended!

Thanks!

Antoine
Nothing is as simple as it seems at first
Or as hopeless as it seems in the middle
Or as finished as it seems in the end.
erwan.l
Cacti User
Posts: 138
Joined: Tue Jan 22, 2008 4:36 am
Contact:

Post by erwan.l »

Hi flukester,

There was actually a bug.
Only 16k were allocated to the OID.
Considering that some of your oid's are up to 128 chars and that you have about 500 entries for now, that buffer was overwritten.
Buffer is now 32k.

This and some other smaller improvements were added.

Regards,
Erwan.
flukester
Posts: 17
Joined: Fri Jan 16, 2009 3:01 pm
Location: Montreal, QC, Canada

Post by flukester »

Hi Erwan!

I've completed adding all the counters I needed on my test host (manually preparing my counters.ini).

All the functionality is there (Thanks!) but the performance is starting to become an issue. The one second delay to get the "per-second" counters is killing me. Right now, my test system is accessible only over a VPN which has a higher latency than a LAN would, which of course does not help..

Could you please seriously evaluate whether you need to have a full second wait per counter or if shorter values could be used? (1/4 or 1/10 of a second maybe?)

(I know it could skew the results to a certain extent, depending on the calculation you have to perform to scale the result back to a x/sec value...). I wish those values were simple counters so we could let Cacti calculate the average..

Another thing that could possibly help would be to modify the code so that it processes multiple requests in parallel. Right now, if I launch a few snmpwalks in parallel (different shells, for example), it almost looks like if the snmpd.exe is serializing access to your DLL (and maybe it does?).

If I launch 4 walks at the same time on a branch with all per-sec counters, one of them receives an answer each second (so each of them taken separately receives an answer every 4 seconds).

What I mean to say is that if multiple walks could truly run in parallel, the 1 second penalty would not hit me so much. I do run spine with multiple threads and a low number of OIDs per snmpget and this would make the 1 second delay a non-issue.

Right now, polling approximately 220 counters (some every 5 minutes, some every minute), I'm getting close to my time limit before spine decides to drop some OIDs from the list. I'm not even polling many per-second counters yet..

Is it possible to make the DLL use threads or another method of parallelism and/or some workaround to the 1 second delay?

Thanks!
Antoine

PS: I suppose having your DLL process the counters.ini file once at startup, then starting a bunch of threads in the background just to poll all the counters at regular intervals (even if no SNMP request is coming) while updating some piece of shared memory, and having a main thread called by snmpd.exe reading the values from the shared memory to provide the responses would be a major rewrite of your application? :) Maybe for v3?
Nothing is as simple as it seems at first
Or as hopeless as it seems in the middle
Or as finished as it seems in the end.
erwan.l
Cacti User
Posts: 138
Joined: Tue Jan 22, 2008 4:36 am
Contact:

Post by erwan.l »

Hi Flukester,
Indeed with so many counters I believe we are pushing the performances quite far.

The main issue is with this windows api : PdhCollectQueryData.

I call it for every snmprequest.
I call it twice so that windows can calculate counters which need an interval.

I need to check if one second delay is really necessary.
Maybe a 1ms delay between 2 calls is ok.

Also, I'll see if i need to open/close pdh library for each snmprequest.
By keeping it open for the life of the snmp agent, then I would not need to call the collectquery twice for every call.

better explained here : http://support.microsoft.com/kb/262938

Regards,
Erwan
Stone_ll
Posts: 16
Joined: Thu Mar 05, 2009 2:15 am

Post by Stone_ll »

Stone_ll wrote:
erwan.l wrote:Hi Stone_II,
What O.S and snmptools version are you running?

Regards,
Erwan.
OS:Windows 2003 Enterprise 32bit
snmptools versioin:2.0.0.10 last modified on Feb 17th 2009


PS: Now the graph on cacti like so :
Hi Erwan
I have sent the new logs to your mailbox creates by the new version snmptools
But there is any change for the graph of the cacti.
such as these graphs:
Attachments
kunming.jpg
kunming.jpg (44.09 KiB) Viewed 10123 times
dalian.jpg
dalian.jpg (45.34 KiB) Viewed 10123 times
nanjing.jpg
nanjing.jpg (45.41 KiB) Viewed 10123 times
erwan.l
Cacti User
Posts: 138
Joined: Tue Jan 22, 2008 4:36 am
Contact:

Post by erwan.l »

Hello Stone_II,

I took a look at the log files and I dont see any error.
I do see that from time to time the executable you are using outputs an empty string, hence the white spaces in your graphs.

Could it be that your exe sometimes errors and returns nothing?

Download last snmptools from last night and give it another try : there is some more debugging details which could help.

If you have access to you devs, here is (in short) how I retrieve the exe output :

// hide application
myStartupInfo.dwFlags := STARTF_USESHOWWINDOW;
myStartupInfo.wShowWindow := SW_HIDE;
// assign pipes
myStartupInfo.dwFlags := myStartupInfo.dwFlags or STARTF_USESTDHANDLES;
myStartupInfo.hStdInput := 0;
myStartupInfo.hStdOutput := hPipeOutputWrite;
myStartupInfo.hStdError := hPipeErrorWrite;

Result := CreateProcess(nil, PChar(CmdLine), nil, nil, True,
CREATE_NEW_CONSOLE, nil, nil, myStartupInfo, myProcessInfo);

if result then
begin
output:=...
end;

Maybe this could help your devs to fix that issue?

Regards,
Erwan.
Stone_ll
Posts: 16
Joined: Thu Mar 05, 2009 2:15 am

Post by Stone_ll »

ok,I'll have a try.
thanks a lot
flukester
Posts: 17
Joined: Fri Jan 16, 2009 3:01 pm
Location: Montreal, QC, Canada

Post by flukester »

erwan.l wrote:Hi Flukester,
Indeed with so many counters I believe we are pushing the performances quite far.

The main issue is with this windows api : PdhCollectQueryData.

I call it for every snmprequest.
I call it twice so that windows can calculate counters which need an interval.

I need to check if one second delay is really necessary.
Maybe a 1ms delay between 2 calls is ok.

Also, I'll see if i need to open/close pdh library for each snmprequest.
By keeping it open for the life of the snmp agent, then I would not need to call the collectquery twice for every call.

better explained here : http://support.microsoft.com/kb/262938

Regards,
Erwan
Hello!

I just wanted to let everyone know that it is fully working now, at least for me. I have quite a large counters.ini file now, over 1000 counters defined. I am polling some of them every 5 minutes, some of them every 1 minute.

The key change was made by Erwan in version 2.0.0.14 or 2.0.0.15 where the delay can now be specified by adding a DWORD entry name collect_delay in the registry. If this value is not specified, the default is 1000ms. You can set it to another value if you require.

In my case, I am currently running with 50ms (50 decimal or 0x32 in hex).

Thanks again to Erwan for this very useful tool!

Antoine
Nothing is as simple as it seems at first
Or as hopeless as it seems in the middle
Or as finished as it seems in the end.
jaywardhan
Posts: 2
Joined: Mon Mar 23, 2009 9:45 am

Post by jaywardhan »

Hi Erwan,

It is indeed a gr8 tool. I installed the tool on a Windows 2003 64x server successfully. I even manage to get "iso.3.6.1.4.1.15 = STRING: "snmptools by erwan.l@free.fr" when doing a snmpget to the above oid.

i also have configured my counters.ini files as below:
[1.3.6.1.4.1.15.1]
counter=PhysicalDisk\Avg. Disk Queue Length\_Total
[1.3.6.1.4.1.15.2]
counter=LogicalDisk\Free Megabytes\_Total
[1.3.6.1.4.1.15.3]
counter=LogicalDisk\% Free Space\C:

doing a snmpwalk to any of the oids above returns value as 0
"iso.3.6.1.4.1.15.3 = Counter64: 0"

below is the log file data:

09:59:22:609 , SnmpExtensionQueryEx
09:59:22:609 , nRequestType=SNMP_EXTENSION_GET
09:59:22:609 , GetRequest: OID=1.3.6.1.4.1.15.3 (8)
09:59:22:609 , GetPerf: path=LogicalDisk\% Free Space\C:
09:59:22:609 , GetPerf: pdh_counter_path=\LogicalDisk(C:)\% Free Space
09:59:23:609 , GetPerf: vartype=Int64
09:59:23:609 , GetRequest: value=231928233984 asn_type=70
09:59:23:609 , GetRequest: OK

I have also observed the snmpservice going to stopped state occasionally when i do a snmpwalk.

Please advice.
jaywardhan
Posts: 2
Joined: Mon Mar 23, 2009 9:45 am

Post by jaywardhan »

Hi,

I further explored the option of using 32 bit snmptool on windows 64 bit server and am successfully able to snmpwalk my custom OIDs.

Its a wonderful tool indeed.

Thanks and Regards,
Jay
leobeach
Posts: 1
Joined: Mon Mar 30, 2009 1:31 pm
Location: Montreal, Canada

Post by leobeach »

flukester wrote:
The key change was made by Erwan in version 2.0.0.14 or 2.0.0.15 where the delay can now be specified by adding a DWORD entry name collect_delay in the registry. If this value is not specified, the default is 1000ms. You can set it to another value if you require.
Hi,

I have tried snmptools to monitor Windows servers with Cacti, and I would like to know where can I find version 2.0.0.15 ?
The snmptools2.zip from iptools webpage is still v2.0.0.14 (the history.txt in it only mentions 2.0.0.14).


Thanks,
Leo
erwan.l
Cacti User
Posts: 138
Joined: Tue Jan 22, 2008 4:36 am
Contact:

Post by erwan.l »

Hello leobeach,
2.0.0.14 is the latest one (i did not increment the version number during my last compilation).

Regards,
Erwan.
stormonts
Cacti User
Posts: 349
Joined: Tue Mar 31, 2009 10:05 am

Post by stormonts »

When I try to use SNMPtools to graph a SQL data file larger than 1GB, either my hard drive space or physical memory graphs bust.

I just tried to install 2.0.0.14 but it didn't help. I'll try and give as much info as I can, but please let me know what else you need.

MSSQL 2005 (and snmptools) are running on a Windows 2003 r2 Standard x64 server. SNMPtools is located in c:\snmptools, snmptools.dll was copied to c:\windows, and "regagentWow6432.reg" was executed.

We are running Cacti 0.8.7b.

Used Hard drive space is using "Host MIB - Hard Drive Space" and "Get Script Server Data (Indexed)" as the data input method.

From the counters.ini file that I created, this is one of the items giving me an issue:

[1.3.6.1.4.1.15.26]
counter=MSSQL$WSUS:Databases(SUSDB)\Data File(s) Size (KB)

Snmpwalk of that OID returns:
SNMPv2-SMI::enterprises.15.26 = INTEGER: 1384128

If I add that data source and graph to cacti, the Hard drive space graph for that server reports "0" for the total. Debug for that graph shows:

/usr/bin/rrdtool create \
/usr/local/cacti-0.8.7b/rra/hawk_hdd_total_2718.rrd \
--step 300 \
DS:hdd_used:GAUGE:600:0:U \
DS:hdd_total:GAUGE:600:0:U \
RRA:AVERAGE:0.5:1:600 \
RRA:AVERAGE:0.5:6:700 \
RRA:AVERAGE:0.5:24:775 \
RRA:AVERAGE:0.5:288:797 \
RRA:MIN:0.5:1:600 \
RRA:MIN:0.5:6:700 \
RRA:MIN:0.5:24:775 \
RRA:MIN:0.5:288:797 \
RRA:MAX:0.5:1:600 \
RRA:MAX:0.5:6:700 \
RRA:MAX:0.5:24:775 \
RRA:MAX:0.5:288:797 \
RRA:LAST:0.5:1:600 \
RRA:LAST:0.5:6:700 \
RRA:LAST:0.5:24:775 \
RRA:LAST:0.5:288:797 \

If I delete that data source, the graph returns to normal. I am successfully able to graph the size of MSSQL data files smaller than 1GB using the same method, but both of the ones that I have over 1GB cause the same behavior.
Attachments
1.jpg
1.jpg (25 KiB) Viewed 9868 times
gatorfreak
Posts: 16
Joined: Fri Mar 27, 2009 11:09 am

Post by gatorfreak »

I've setup 4 OIDs in the counters.ini file and Cacti is able to read the first one but gives the following error message for all subsequent OIDs:

Code: Select all

CMDPHP: Poller[0] Host[4] DS[251] WARNING: Result from SNMP not valid. Partial Result: U 
Using snmpget, I get valid values from all OIDs in the ini file.

Here is what is in my ini file:

Code: Select all

[1.3.6.1.4.1.15.1]
counter=HP EVA Host Port Statistics\Read KB/s\EVA01:500J:FP1

[1.3.6.1.4.1.15.2]
counter=HP EVA Host Port Statistics\Read KB/s\EVA01:500J:FP2

[1.3.6.1.4.1.15.3]
counter=HP EVA Host Port Statistics\Read KB/s\EVA01:D01N:FP1

[1.3.6.1.4.1.15.4]
counter=HP EVA Host Port Statistics\Read KB/s\EVA01:D01N:FP2
Any ideas why cacti can only read the first one?
Stone_ll
Posts: 16
Joined: Thu Mar 05, 2009 2:15 am

Post by Stone_ll »

Stone_ll wrote:ok,I'll have a try.
thanks a lot
I have also met the same problem just like that ago....
I use the versioin 2.0.0.14.Could you give me some other opinions?thanks
Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest