Disk Graphs show totals as "nan" now

Post by **TheWitness** » Wed Apr 28, 2010 5:31 pm

The heartbeat on the second Data Source is too low. It should be 600, and instead it's 120. I suspect someone attempted to do 1 minute polling and therefore your RRD's are damaged. This had to have happened from when they were created. Maybe at one time you were polling at 1 minute. This is pretty hard to fix.

You need to modify the Data Template to return the heartbeat's to 600, and then goto www.rrdtool.org to research how to change that heartbeat. You may end up having to use a contrib script to do it.

TheWitness

trent1980 · Post by **trent1980** » Wed Apr 28, 2010 5:36 pm

Yes I do have 1 minute polling ...

600 on the total like it is on used?

I don't need to retroactively fix .... if I could just get it going forward, I'd be good.

If I need to start over, It's not that big of a deal for me to keep a copy of the server at this point for old stuff (it's on a VM).

deepsys · Post by **deepsys** » Thu Apr 29, 2010 6:04 am

Hello,

after upgrading from 0.8.7b to 0.8.7e with all patches, i have the same problem.

I made a debug cacti log and it seems that the first HDD that is polled makes the problem.

I have on server where i have problems with HDD C:,
and another has the problem with HDD D:.

Please take a look at the PDF.

Post by **TheWitness** » Thu Apr 29, 2010 6:25 am

Then, fix the problem. If it's the same problem, you can follow the instructions that I provided to correct the issue.

TheWitness

deepsys · Post by **deepsys** » Thu Apr 29, 2010 7:18 am

TheWitness wrote:Then, fix the problem. If it's the same problem, you can follow the instructions that I provided to correct the issue.

TheWitness

Sorry, but which fix do you mean ??

My heartbeat time is 600 ...

trent1980 · Post by **trent1980** » Thu Apr 29, 2010 9:13 am

TheWitness wrote:The heartbeat on the second Data Source is too low. It should be 600, and instead it's 120. I suspect someone attempted to do 1 minute polling and therefore your RRD's are damaged. This had to have happened from when they were created. Maybe at one time you were polling at 1 minute. This is pretty hard to fix.

You need to modify the Data Template to return the heartbeat's to 600, and then goto www.rrdtool.org to research how to change that heartbeat. You may end up having to use a contrib script to do it.

TheWitness

Thanks ... Just in case someone else comes across this, I just manually updated each of my disk rrd's and they are reporting/graphing the totals now.

a sample of "tune" I had to do is:
rrdtool tune ssimsapp00_hdd_used_105.rrd -h hdd_total:600

Thanks again for the help

trent1980 · Post by **trent1980** » Thu Apr 29, 2010 10:31 am

Out of the 40 or so disk checks, all are working as normal except 5 (C:\ only ... the other drives are reporting normal)

I reset the poller cache and now I get "totals" but not "used". The "used" space on those 5 all show "0". I can't find anything different about these 5 from the others (same templates ...)

PHP Script Server has Started - Parent is cmd
/var/www/html/cacti/scripts/ss_host_disk.php ss_host_disk 192.168.167.238 5 1:161:500:1:10:public:::MD5::DES: get used 2
15579299840
/var/www/html/cacti/scripts/ss_host_disk.php ss_host_disk 192.168.167.238 5 1:161:500:1:10:public:::MD5::DES: get total 2
21476171776
/var/www/html/cacti/scripts/ss_host_disk.php ss_host_disk 192.168.167.238 5 1:161:500:1:10:public:::MD5::DES: get used 2
0
04/29/2010 08:27:17 AM - PHPSVR: Poller[0] Maximum runtime of 52 seconds exceeded for the Script Server. Exiting.

trent1980 · Post by **trent1980** » Thu Apr 29, 2010 10:50 am

increased the snmp timeout to 3000 on those 5 hosts and getting the used data now on 4 of th 5.

trent1980 · Post by **trent1980** » Thu Apr 29, 2010 11:26 am

increased the 5th device to 5000 ms timeout and it's graphing now

is that normal?

Post by **gandalf** » Thu Apr 29, 2010 4:13 pm

trent1980 wrote:increased the 5th device to 5000 ms timeout and it's graphing now

is that normal?

What do you expect from a windows system?
Unfortunately, SNMP is very slow, there. DOn't know if that looks better when using net-snmp or SNMP Informant instead
R.

Post by **TheWitness** » Thu Apr 29, 2010 8:52 pm

Yea, my comment about MS' implementation is that it's designed to fail. Too much overhead with regard to the disk stuff.

The reason being is that when you ask the question of one disk, it refreshes for all disks. If the disks are not mounted (A: drive who has those any more). It just waits for each to time out. They need to give it an overhaul.

TheWitness

deepsys · Post by **deepsys** » Fri Apr 30, 2010 12:50 am

trent1980 wrote:increased the snmp timeout to 3000 on those 5 hosts and getting the used data now on 4 of th 5.

That works !!!

But i can't understand why this failure happens after the cacti update ??

OK, it works.
Thanks for your help !!

Cacti

Disk Graphs show totals as "nan" now

Same problem

Who is online