Windows 2008R2 Drives in TB & PB don't graph correctly

Post support questions that relate to the Windows 2003/2000/XP operating systems.

Moderators: Developers, Moderators

animalmother
Posts: 18
Joined: Tue Jan 25, 2011 7:04 pm

Windows 2008R2 Drives in TB & PB don't graph correctly

Post by animalmother »

Running
cacti-0.8.7g
rrdtool-1.4.4
net-snmp-5.3.2

Cacti server is on 64bit Linux (Citrix) and I am polling Windows 2008R2 clients.

Using the "SNMP - Get Mounted Partitions" Data Query and all my drives that are in the TB's graph way low. Actually have some drives in the PB's. Has anyone dealt with this? I am sure it is the math being done on the returned data, or the OID's are not correct.

Drilling down I can find the IOD's that are being pulled (ss_host_disk.php)
"total" => ".1.3.6.1.2.1.25.2.3.1.5",
"used" => ".1.3.6.1.2.1.25.2.3.1.6",

On a server with a drive 2.6P in "Total" and 1.7P "used" I get this back from a SNMP Walk. Not too sure what to do with the numbers I am seeing. Do I need a CDEF of something to work on drives in the TB and PB?

snmpwalk -v2c -c public server .1.3.6.1.2.1.25.2.3.1.5
HOST-RESOURCES-MIB::hrStorageSize.1 = INTEGER: 35808511
HOST-RESOURCES-MIB::hrStorageSize.2 = INTEGER: 25599
HOST-RESOURCES-MIB::hrStorageSize.3 = INTEGER: -1
HOST-RESOURCES-MIB::hrStorageSize.4 = INTEGER: 1194714880
HOST-RESOURCES-MIB::hrStorageSize.5 = INTEGER: -1
HOST-RESOURCES-MIB::hrStorageSize.6 = INTEGER: -1
HOST-RESOURCES-MIB::hrStorageSize.7 = INTEGER: -876505088
HOST-RESOURCES-MIB::hrStorageSize.8 = INTEGER: -1
HOST-RESOURCES-MIB::hrStorageSize.9 = INTEGER: -1
HOST-RESOURCES-MIB::hrStorageSize.10 = INTEGER: 520667
HOST-RESOURCES-MIB::hrStorageSize.11 = INTEGER: 261966


v2c -c public server .1.3.6.1.2.1.25.2.3.1.6
HOST-RESOURCES-MIB::hrStorageUsed.1 = INTEGER: 8769806
HOST-RESOURCES-MIB::hrStorageUsed.2 = INTEGER: 7051
HOST-RESOURCES-MIB::hrStorageUsed.3 = INTEGER: 0
HOST-RESOURCES-MIB::hrStorageUsed.4 = INTEGER: 832664506
HOST-RESOURCES-MIB::hrStorageUsed.5 = INTEGER: -1032437094
HOST-RESOURCES-MIB::hrStorageUsed.6 = INTEGER: -366553742
HOST-RESOURCES-MIB::hrStorageUsed.7 = INTEGER: 377902383
HOST-RESOURCES-MIB::hrStorageUsed.8 = INTEGER: -2132223362
HOST-RESOURCES-MIB::hrStorageUsed.9 = INTEGER: 2045323832
HOST-RESOURCES-MIB::hrStorageUsed.10 = INTEGER: 69062
HOST-RESOURCES-MIB::hrStorageUsed.11 = INTEGER: 79844

What is graphing currently for this drive attached.
Attachments
What is graphing currently for this drive
What is graphing currently for this drive
q.png (46.12 KiB) Viewed 9430 times
animalmother
Posts: 18
Joined: Tue Jan 25, 2011 7:04 pm

Re: Windows 2008R2 Drives in TB & PB don't graph correctly

Post by animalmother »

I did notice this post

Installation
This is a pure SNMP based replacement of the standard cacti disk usage templates. It requires support by the non-standard HOST MIB, however.
untar the attached resource file
Drop hrStorageTable.xml into your <path_cacti>/resources/snmp_queries/ folder.
Import the cacti087d_data_query_snmp_-_hrstoragetable template via import feature
Add the Data Query to a device.
From the 'Create Graphs for this Host' screen, select the required disks and click Create.

http://docs.cacti.net/usertemplate:data ... disk_usage

Has anyone had luck with this? Does it show drives in the TB's & PB's?

Thanks!
Craig
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: Windows 2008R2 Drives in TB & PB don't graph correctly

Post by gandalf »

Negative integers are bad, that's for sure.
But in the host MIB, there is another index defining the chunk size. That is, the space is computed by counting chunks and multiplying by chunk size. This is tackled by my SNMP only solution. Only I fear that multiplying a negative integer with any chunk size won't give you want you expect.

A script would be able to catch that, e.g. interpreting the integer as unsigned. But hey, then the pure SNMP solution would be blown away :oops: :cry:
Perhaps, some CDEF magic can catch that as well (if integer < 0, add some 2**something to that value to turn it into positive integer)
Unfortunately, I do not have access to PB storage, so I can't verify.

Hope this points you to a possible solution.
R.
noname
Cacti Guru User
Posts: 1566
Joined: Thu Aug 05, 2010 2:04 am
Location: Japan

Re: Windows 2008R2 Drives in TB & PB don't graph correctly

Post by noname »

gandalf wrote:Perhaps, some CDEF magic can catch that as well (if integer < 0, add some 2**something to that value to turn it into positive integer)
This?
- http://forums.cacti.net/viewtopic.php?p=90290#p90290

Code: Select all

"convert 32bit signed to unsigned" 
cdef=CURRENT_DATA_SOURCE,0,GE,CURRENT_DATA_SOURCE,4294967295,CURRENT_DATA_SOURCE,+,IF 

"convert 32bit signed to unsigned, multiply by 1024" 
Build on the brevious one, and multiply by 1024.
FYI:

HOST-RESOURCES-MIB (including hrStorageTable) can't handle more than 32-bit disk blocks.
- http://forums.cacti.net/viewtopic.php?f=21&t=34309
- http://forums.cacti.net/viewtopic.php?f=2&t=35500

If Net-SNMP (UCD-SNMP) is available on the target host, it seems that dskTable supports 64-bit counter since v5.5.
Net-SNMP - CHANGES
- [PATCH 2449210]: add 64-bit disk usage statistics to UCD-SNMP-MIB::dskTable
(But I'm not sure whether if this also supports Windows platform)
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: Windows 2008R2 Drives in TB & PB don't graph correctly

Post by gandalf »

Yep, you're hitting it right and square
R.
animalmother
Posts: 18
Joined: Tue Jan 25, 2011 7:04 pm

Re: Windows 2008R2 Drives in TB & PB don't graph correctly

Post by animalmother »

So I could create CDEF's with:

"convert 32bit signed to unsigned"
cdef=CURRENT_DATA_SOURCE,0,GE,CURRENT_DATA_SOURCE,4294967295,CURRENT_DATA_SOURCE,+,IF

"convert 32bit signed to unsigned, multiply by 1024"
Build on the brevious one, and multiply by 1024.


(Not sure exactly how to implement those... but can hack at it) Have made CDEF's before.

And that would work with the "Windows 2000/XP Host" or Gandalf's http://docs.cacti.net/usertemplate:data ... disk_usage template? In the end they are pulling the same OID's I believe.

We are a media shop and all our drives are in the 100's of TB's or a few PB's. Creates unique issue's.

Thanks!
Craig
animalmother
Posts: 18
Joined: Tue Jan 25, 2011 7:04 pm

Re: Windows 2008R2 Drives in TB & PB don't graph correctly

Post by animalmother »

Does updating net-snmp on the Cacti server make any difference?

Thanks!
Craig
animalmother
Posts: 18
Joined: Tue Jan 25, 2011 7:04 pm

Re: Windows 2008R2 Drives in TB & PB don't graph correctly

Post by animalmother »

Was able to create
"convert 32bit signed to unsigned"
cdef=CURRENT_DATA_SOURCE,0,GE,CURRENT_DATA_SOURCE,4294967295,CURRENT_DATA_SOURCE,+,IF
Untitled.png
Untitled.png (19.12 KiB) Viewed 9411 times
But not sure how to implement the second you mention?

"convert 32bit signed to unsigned, multiply by 1024"
Build on the brevious one, and multiply by 1024.

Thanks!
Craig
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: Windows 2008R2 Drives in TB & PB don't graph correctly

Post by gandalf »

animalmother wrote:Does updating net-snmp on the Cacti server make any difference?

Thanks!
Craig
Can't tell. Both, the target system has to provide 64bit data and the cacti system has to "print" them.
R.
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: Windows 2008R2 Drives in TB & PB don't graph correctly

Post by gandalf »

animalmother wrote:Was able to create
"convert 32bit signed to unsigned"
cdef=CURRENT_DATA_SOURCE,0,GE,CURRENT_DATA_SOURCE,4294967295,CURRENT_DATA_SOURCE,+,IF
Untitled.png
But not sure how to implement the second you mention?

"convert 32bit signed to unsigned, multiply by 1024"
Build on the brevious one, and multiply by 1024.

Thanks!
Craig
When creating a new CDEF, you are able to refer to an existing CDEF and then multiply that by 1024. Sure enough, you can put both calculations into a single CDEF.
R.
animalmother
Posts: 18
Joined: Tue Jan 25, 2011 7:04 pm

Re: Windows 2008R2 Drives in TB & PB don't graph correctly

Post by animalmother »

When I try that CDEF I just get poo. Not sure if this is possible. Maybe have to resort to WMI or something. But would like to make this work.
candlerb
Posts: 10
Joined: Tue Nov 16, 2010 6:38 am

Re: Windows 2008R2 Drives in TB & PB don't graph correctly

Post by candlerb »

I think I've cracked this.

(1) I modified the existing CDEF for "Host MIB - hrStorageTable Units" along the suggested lines:
CDEF_hrStorageTable_units.png
CDEF_hrStorageTable_units.png (69.9 KiB) Viewed 9128 times
However, looking at a device with >2^31 blocks in total still gave "-Nan" for the total size. Looking at the graph with graph debug mode turned on, it looked fine: the new CDEFs are being incorporated.

Code: Select all

/usr/bin/rrdtool graph - \
--imgformat=PNG \
--start=-86400 \
--end=-300 \
--title='    thecus1 - hrStorageTable - /raid0/data' \
--base=1000 \
--height=120 \
--width=600 \
--alt-autoscale-max \
--lower-limit=0 \
--vertical-label='Storage' \
--slope-mode \
--font TITLE:12: \
--font AXIS:8: \
--font LEGEND:10: \
--font UNIT:8: \
DEF:a="/var/lib/cacti/rra/thecus1_hrstoragesize_544.rrd":hrStorageSize:AVERAGE \
DEF:b="/var/lib/cacti/rra/thecus1_hrstoragesize_544.rrd":hrStorageUsed:AVERAGE \
CDEF:cdefa=a,0,GE,a,a,4294967296,+,IF,4096,* \
CDEF:cdefe=b,0,GE,b,b,4294967296,+,IF,4096,* \
LINE1:cdefa#FF0000FF:"Total Size (Bytes)"  \
GPRINT:cdefa:LAST:"Current\:%8.2lf%s"  \
GPRINT:cdefa:AVERAGE:"Average\:%8.2lf%s"  \
GPRINT:cdefa:MAX:"Maximum\:%8.2lf%s\n"  \
AREA:cdefe#0000FFFF:"Used Size (Units)"  \
GPRINT:cdefe:LAST:" Current\:%8.2lf%s"  \
GPRINT:cdefe:AVERAGE:"Average\:%8.2lf%s"  \
GPRINT:cdefe:MAX:"Maximum\:%8.2lf%s\n" 
(2) It seems the problem is in the data template; 0 for maximum means 'unlimited', but 0 for minimum means 'must not go below 0'

So I opened the data template for "Hard Drive Space" and set the minimum value for hdd_total and hdd_used to -2147483648 (if using the hrStorageSize template then the parameters are "hrStorageSize" and "hrStorageUsed")

Data source debug mode now shows this:

Code: Select all

/usr/bin/rrdtool create \
/var/lib/cacti/rra/thecus1_hrstoragesize_544.rrd \
--step 300  \
DS:hrStorageSize:GAUGE:600:-2147483648:U \
DS:hrStorageUsed:GAUGE:600:-2147483648:U \
RRA:AVERAGE:0.5:1:500 \
RRA:AVERAGE:0.5:1:600 \
RRA:AVERAGE:0.5:6:700 \
RRA:AVERAGE:0.5:24:775 \
RRA:AVERAGE:0.5:288:797 \
RRA:MAX:0.5:1:500 \
RRA:MAX:0.5:1:600 \
RRA:MAX:0.5:6:700 \
RRA:MAX:0.5:24:775 \
RRA:MAX:0.5:288:797 \
(3) Finally, existing rrd files need to have the minimum changed from 0 to -2147483648.

Code: Select all

# rrdtool tune /var/lib/cacti/rra/thecus1_hrstoragesize_544.rrd --minimum hrStorageSize:-2147483648
# rrdtool tune /var/lib/cacti/rra/thecus1_hrstoragesize_544.rrd --minimum hrStorageUsed:-2147483648
(Or "hdd_used" and "hdd_total" if using the hdd_used approach)

And hey presto, it works.

I'd say this is all a frig though, because I would rather clean the data before writing it into the rrd, not when displaying it. I would guess it will seriously mess up averages if the value crosses the 2^31 value.

So I think the right solution would be to override the INTEGER32 value collected by snmp and force it to be treated as UNSIGNED32 - then all the above becomes unnecessary. Is this possible?
candlerb
Posts: 10
Joined: Tue Nov 16, 2010 6:38 am

Re: Windows 2008R2 Drives in TB & PB don't graph correctly

Post by candlerb »

Update: the above doesn't work when polling for hdd_total/hdd_used instead of hrStorageSize/hrStorageUsed. I still see -Nan for the total storage size (and I expect I would see -Nan for storage used, if it exceeded 2^31 blocks)

The difference appears to be in the data queries. hdd_total/hdd_used are coming from "SNMP - get mounted partitions" and are using "Get script_server data (indexed)"; whereas hrStorageSize/hrStorageUsed are coming from "SNMP - hrStorageTable" using "Get SNMP data (indexed)"

If I run the collection script by hand it retrieves a negative value as expected:

Code: Select all

# php /usr/share/cacti/site/script_server.php
PHP Deprecated:  Function split() is deprecated in /usr/share/cacti/site/script_server.php on line 43
PHP Script Server has Started - Parent is cmd
/usr/share/cacti/site/scripts/ss_host_disk.php ss_host_disk monster1 5 2:161:2:2:200:public get used 2
32513305083904
/usr/share/cacti/site/scripts/ss_host_disk.php ss_host_disk monster1 5 2:161:2:2:200:public get total 2
-20736374767616
However, this time the value seems to be in bytes not blocks. This means that my lower limit of -2147483648 not low enough, and that's why I still see -Nan.

Unfortunately, it also means that if I were to write a CDEF to correct this, the value to add will depend on the block size, i.e. I would have to add |hrStorageAllocationUnits| * 2^32. For example, on this volume the hrStorageAllocationUnits is 32768, so the true storage size of this volume is

-20736374767616 + (2 ** 32 * 32768) = 120001113587712 = 109TB

However, since the code which collects the data is in PHP, it's easy to modify, so a better solution is just to sort out the problem at source:

Code: Select all

--- ss_host_disk.php.orig	2011-10-25 12:09:53.877226095 +0100
+++ ss_host_disk.php	2011-10-25 12:11:31.265217758 +0100
@@ -80,7 +80,9 @@
 
 		if (($arg == "total") || ($arg == "used")) {
 			$sau = eregi_replace("[^0-9]", "", db_fetch_cell("select field_value from host_snmp_cache where host_id=$host_id and field_name='hrStorageAllocationUnits' and snmp_index='$index'"));
-			return cacti_snmp_get($hostname, $snmp_community, $oids[$arg] . ".$index", $snmp_version, $snmp_auth_username, $snmp_auth_password, $snmp_auth_protocol,$snmp_priv_passphrase,$snmp_priv_protocol, $snmp_context, $snmp_port, $snmp_timeout, $ping_retries, SNMP_POLLER)* $sau;
+			$tmp = cacti_snmp_get($hostname, $snmp_community, $oids[$arg] . ".$index", $snmp_version, $snmp_auth_username, $snmp_auth_password, $snmp_auth_protocol,$snmp_priv_passphrase,$snmp_priv_protocol, $snmp_context, $snmp_port, $snmp_timeout, $ping_retries, SNMP_POLLER);
+			if ($tmp < 0) { $tmp += 4294967296; }
+			return $tmp * $sau;
 		}else{
 			return cacti_snmp_get($hostname, $snmp_community, $oids[$arg] . ".$index", $snmp_version, $snmp_auth_username, $snmp_auth_password, $snmp_auth_protocol,$snmp_priv_passphrase,$snmp_priv_protocol, $snmp_context, $snmp_port, $snmp_timeout, $ping_retries, SNMP_POLLER);
 		}
With this patch, the script now returns the correct (large) amount of storage space available:

Code: Select all

/usr/share/cacti/site/scripts/ss_host_disk.php ss_host_disk monster1 5 2:161:2:2:200:public get total 2
1.2000111358771E+14
which means the RRD contains the correct data, and finally these graphs are working - whoohoo!
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: Windows 2008R2 Drives in TB & PB don't graph correctly

Post by gandalf »

candlerb wrote:I'd say this is all a frig though, because I would rather clean the data before writing it into the rrd, not when displaying it. I would guess it will seriously mess up averages if the value crosses the 2^31 value.

So I think the right solution would be to override the INTEGER32 value collected by snmp and force it to be treated as UNSIGNED32 - then all the above becomes unnecessary. Is this possible?
This is a good deal of analysis.
To be honest, I'd like to go the hrStorageTable way as it performs way faster than the script stuff.
But still the INTEGER32 issue remains; I'm not sure if the root cause is SNMP, then (BTB: which SNMP code are you using on the target system?). I will have to look at the hrStorageTable standard definition to find out, if it still talks about INTERGER32.

As a workaround, we may apply your "INTEGER32" patch to the standard treatment of SNMP results for SNMP Data Queries hoping, that this won't affect other meaningful data.

R.
candlerb
Posts: 10
Joined: Tue Nov 16, 2010 6:38 am

Re: Windows 2008R2 Drives in TB & PB don't graph correctly

Post by candlerb »

gandalf wrote:BTB: which SNMP code are you using on the target system?
I have a small menagerie of buggy SNMP agents here :-)
  • Windows 2008 server running its standard SNMP agent
  • Thecus N8800PRO running firmware 3.02.01
  • net-snmp 5.4.3 (from Ubuntu 11.04)
The first two of these return values for hrStorageSize greater than 2^31 in 4 bytes, i.e. which appear negative to the receiver. Neither of them support dskTable so you are forced to use hrStorageTable.

net-snmp has a more insidious bug in hrStorageTable. Instead of returning a negative value, it just returns the same hrStorageSize as the previous row!

Code: Select all

$ snmpwalk -v2c -cpublic x.x.x.x hrStorageTable | grep '\.3[34] = '
HOST-RESOURCES-MIB::hrStorageIndex.33 = INTEGER: 33
HOST-RESOURCES-MIB::hrStorageIndex.34 = INTEGER: 34
HOST-RESOURCES-MIB::hrStorageType.33 = OID: HOST-RESOURCES-TYPES::hrStorageFixedDisk
HOST-RESOURCES-MIB::hrStorageType.34 = OID: HOST-RESOURCES-TYPES::hrStorageFixedDisk
HOST-RESOURCES-MIB::hrStorageDescr.33 = STRING: /dev
HOST-RESOURCES-MIB::hrStorageDescr.34 = STRING: /media/data
HOST-RESOURCES-MIB::hrStorageAllocationUnits.33 = INTEGER: 4096 Bytes
HOST-RESOURCES-MIB::hrStorageAllocationUnits.34 = INTEGER: 4096 Bytes
HOST-RESOURCES-MIB::hrStorageSize.33 = INTEGER: 3089665
HOST-RESOURCES-MIB::hrStorageSize.34 = INTEGER: 3089665        <<<< OOPS!
HOST-RESOURCES-MIB::hrStorageUsed.33 = INTEGER: 162
HOST-RESOURCES-MIB::hrStorageUsed.34 = INTEGER: 651026208
So in this example, disk 34 shows as 20,000% percent full if you divide hrStorageUsed by hrStorageSize. Reported at
https://bugs.launchpad.net/ubuntu/+sour ... bug/865268

However with net-snmp you can use dskTable instead, and since dskTable includes a dskPercent used, it's convenient for alerting anyway (e.g. in Nagios). Also I believe net-snmp 5.5+ will provide 64-bit values in dskTable.
gandalf wrote:I will have to look at the hrStorageTable standard definition to find out, if it still talks about INTERGER32
RFC 2790 says it's Integer32. But more importantly, this is how the value is tagged when it is queried via SNMP (the ASN.1 BER-encoded response). For example, tcpdump shows 02 04 da 47 df 7f: 02 = tag 2 (signed integer); 04 = data length; da47df7f is the data, which tag 2 implies is negative. Since this is explicitly tagged as a signed integer, changing the MIB definition doesn't make a difference unfortunately.

Regards,

Brian.
Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest