[SOLVED] Gaps / "nan" returned

Post support questions that directly relate to Linux/Unix operating systems.

Moderators: Developers, Moderators

khufure
Cacti User
Posts: 203
Joined: Wed Oct 24, 2007 5:47 pm
Location: San Francisco, CA
Contact:

Post by khufure »

Sadly, my gaps came back again. Found a table corruption. After running the fix, poller is processing better. Well, so far. It's been about 1 hour.

I am now running hacks to both kill old poller processes and repair tables, every 10 minutes. I'd prefer not to run these, but better these than the machine runs out of memory and OOMs important stuff. And cacti is too useful to turn off.

# kill old cacti processes
*/10 * * * cacti /usr/local/bin/find_kill_old_cacti_procs.sh
*/10 * * * root /usr/local/bin/repair_cacti_tables.sh

</usr/local/bin/repair_cacti_tables.sh>
#!/bin/bash
# repair cacti tables
/usr/bin/myisamchk --silent --force --fast --update-state --key_buffer_size=64M --sort_buffer_size=64M --read_buffer_size=1M --write_buffer_size=1M /var/lib/mysql/cacti/*.MYI
</>
khufure
Cacti User
Posts: 203
Joined: Wed Oct 24, 2007 5:47 pm
Location: San Francisco, CA
Contact:

Post by khufure »

gandalf wrote:That makes things more complicated. Will try to investigate. But no promise made ...
Reinhard
Yeah understood, it's frustrating though :P

I think the hung processes was from corruption/locks in the DB. The tech support page found some entries. Hope so anyway!
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Post by gandalf »

So I'll drop it, then.
Reinhard
khufure
Cacti User
Posts: 203
Joined: Wed Oct 24, 2007 5:47 pm
Location: San Francisco, CA
Contact:

Post by khufure »

Update : my hacks are successful in making Cacti stable now. The gaps prove to be on the bandwidth graphs of one server.

What can be done to troubleshoot whatever might be wrong on the bandwidth graphs? If I had to guess, I suspect it is a 64-bit problem.
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Post by gandalf »

First, see what cacti.log says about it.
Reinhard
foo
Posts: 26
Joined: Tue Feb 24, 2004 12:06 am

Post by foo »

Same problem. Cacti hangs randomly on certain hosts, usually the same hosts. Gives SNMP timeout errors, however, snmpd is working fine on those boxes. I can manually poll that information, and get it instantly. Also tried raising the SNMP timeout for those machines to 5+ seconds. No change.

Switching back to cmd.php works, but it's not ideal.

This started when I upgraded to 0.8.7b. I am running the latest spine.
ccogdill
Posts: 30
Joined: Wed Apr 25, 2007 1:24 pm
Location: Bismarck, ND

Post by ccogdill »

I too ran into the issue of Cacti hanging on various hosts then eventually timing out but traced it down to a problem with using SNMP v2 in the Cacti interface. Once I changed the SNMP version to V1. The problem went away and the host no longer timed out.
ENVIRONMENT
---------------------------
OS: Solaris 10 update 5
Apache 2.2.8
PHP 5.2.5
libxml 2.6.31
MySQL 5.0.51a
rrdtool 1.2.x
Cacti 0.8.7b (with settings 0.5 and thold 0.3.9
Spine 0.8.7a
Perl 5.10.0
foo
Posts: 26
Joined: Tue Feb 24, 2004 12:06 am

Post by foo »

ccogdill wrote:I too ran into the issue of Cacti hanging on various hosts then eventually timing out but traced it down to a problem with using SNMP v2 in the Cacti interface. Once I changed the SNMP version to V1. The problem went away and the host no longer timed out.
That's a workaround, not a resolution. Doing manual queries works fine with snmpv2. The bug is with Cacti. I actually stopped using Cacti due to this bug, and several, several others.
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Post by gandalf »

ccogdill wrote:I too ran into the issue of Cacti hanging on various hosts then eventually timing out but traced it down to a problem with using SNMP v2 in the Cacti interface. Once I changed the SNMP version to V1. The problem went away and the host no longer timed out.
Are you using php-snmp or net-snmp libraries, then? Please report versions of those packages
Reinhard
ccogdill
Posts: 30
Joined: Wed Apr 25, 2007 1:24 pm
Location: Bismarck, ND

Post by ccogdill »

Net-SNMP version 5.0.9.

I used the SNMP agent that comes pre-installed on Solaris 10.
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Post by gandalf »

I suspect an issue with snmpbulkwalk. This is only triggered when using SNMP V2. Net-snmp 5.0.9 is waaay old. Perhaps bulkwalk is broken, there.
Reinhard
aleto
Posts: 39
Joined: Wed May 25, 2005 3:57 am

Re: [SOLVED] Gaps / "nan" returned

Post by aleto »

After upgrading to cacti 0.8.7g this probleem seems to have returned. Except that graphs are not updating AT ALL. Sometimes it updates..

Any suggestions? Ive seen the "poller bug". Poller time is 1 minute cron and 1 minute interval.
Post Reply

Who is online

Users browsing this forum: No registered users and 0 guests