Cacti stops collecting at random hosts - SOLVED

Post support questions that directly relate to Linux/Unix operating systems.

Moderators: Developers, Moderators

santiagosoares
Posts: 14
Joined: Fri Jun 23, 2006 10:46 am

Cacti stops collecting at random hosts - SOLVED

Post by santiagosoares »

First of all, I'm sorry for my poor english.. :oops:

I have Cacti up and running, monitoring about 950 hosts, all of them routers and switches. I'm using a dual Xeon 3.20GHz with 2GB RAM with:

Debian Sarge
Cacti 0.8.6g
Cactid 0.8.6f-1
MySQL: Ver 12.22 Distrib 4.0.24, for pc-linux-gnu (i386)
PHP 4.3.10-16
RRDtool 1.0.49

The problem is that, sometimes, the polling process finishes in a good time, about 55 seconds, but in other times it simply stops collecting data, and times out, with no errors.
This is a piece of my log file, when the thing happens:

06/23/2006 12:20:42 PM - CACTID: Poller[0] Host[895] DS[3917] SNMP: v1: 10.66.255.6, dsname: discards_in, oid: .1.3.6.1.2.1.2.2.1.13.2, value: 0
06/23/2006 12:20:42 PM - CACTID: Poller[0] Host[895] DS[3917] SNMP: v1: 10.66.255.6, dsname: errors_in, oid: .1.3.6.1.2.1.2.2.1.14.2, value: 1195300
06/23/2006 12:20:42 PM - CACTID: Poller[0] Host[895] DS[3916] SNMP: v1: 10.66.255.6, dsname: traffic_out, oid: .1.3.6.1.2.1.2.2.1.16.2, value: 444330804
06/23/2006 12:20:42 PM - CACTID: Poller[0] Host[895] DS[3917] SNMP: v1: 10.66.255.6, dsname: discards_out, oid: .1.3.6.1.2.1.2.2.1.19.2, value: 0
06/23/2006 12:20:42 PM - CACTID: Poller[0] Host[895] DS[3917] SNMP: v1: 10.66.255.6, dsname: errors_out, oid: .1.3.6.1.2.1.2.2.1.20.2, value: 0
06/23/2006 12:24:58 PM - POLLER: Poller[0] Maximum runtime of 296 seconds exceeded. Exiting.

Every host that would be polled after 10.66.255.6 has a gap in his graph.
And the last host (in this case 10.66.255.6) is a random host, not always the same.
Any body has an idea about what is happening here?
Last edited by santiagosoares on Wed Jul 12, 2006 8:26 am, edited 1 time in total.
santiagosoares
Posts: 14
Joined: Fri Jun 23, 2006 10:46 am

Post by santiagosoares »

Anybody???
User avatar
rony
Developer/Forum Admin
Posts: 6022
Joined: Mon Nov 17, 2003 6:35 pm
Location: Michigan, USA
Contact:

Post by rony »

I have been seeing this.

What version of net-snmp do you have installed? Are you using Cactid or cmd.php?
[size=117][i][b]Tony Roman[/b][/i][/size]
[size=84][i]Experience is what causes a person to make new mistakes instead of old ones.[/i][/size]
[size=84][i]There are only 3 way to complete a project: Good, Fast or Cheap, pick two.[/i][/size]
[size=84][i]With age comes wisdom, what you choose to do with it determines whether or not you are wise.[/i][/size]
santiagosoares
Posts: 14
Joined: Fri Jun 23, 2006 10:46 am

Post by santiagosoares »

NET-SNMP version: 5.1.2
I'm using cactid version 0.8.6f-1
Any clue?
User avatar
rony
Developer/Forum Admin
Posts: 6022
Joined: Mon Nov 17, 2003 6:35 pm
Location: Michigan, USA
Contact:

Post by rony »

No clue yet... Durning the drop outs, does your poller get the 296 second timeout error?
[size=117][i][b]Tony Roman[/b][/i][/size]
[size=84][i]Experience is what causes a person to make new mistakes instead of old ones.[/i][/size]
[size=84][i]There are only 3 way to complete a project: Good, Fast or Cheap, pick two.[/i][/size]
[size=84][i]With age comes wisdom, what you choose to do with it determines whether or not you are wise.[/i][/size]
santiagosoares
Posts: 14
Joined: Fri Jun 23, 2006 10:46 am

Post by santiagosoares »

Sorry, I didn't understand your question.
But i have gaps on the graphs when the poller times out.
The strange is, the poller goes until next to the end, and then it looks like freeze for almost 5 minutes, and then times out.
Look:
06/23/2006 12:20:42 PM - CACTID: Poller[0] Host[895] DS[3917] SNMP: v1: 10.66.255.6, dsname: errors_out, oid: .1.3.6.1.2.1.2.2.1.20.2, value: 0
06/23/2006 12:24:58 PM - POLLER: Poller[0] Maximum runtime of 296 seconds exceeded. Exiting.

And sometimes it finishes the poll in about 55 seconds.
santiagosoares
Posts: 14
Joined: Fri Jun 23, 2006 10:46 am

Post by santiagosoares »

Manually running the poller I've got this exit:

OK u:0.12 s:0.20 r:41.46
OK u:0.12 s:0.20 r:41.46
OK u:0.12 s:0.20 r:41.46
OK u:0.12 s:0.20 r:41.46
OK u:0.12 s:0.20 r:41.46
Waiting on 1/1 pollers.
Waiting on 1/1 pollers.
Waiting on 1/1 pollers.
.
.
.
Waiting on 1/1 pollers.
Waiting on 1/1 pollers.
Waiting on 1/1 pollers.
06/26/2006 02:10:42 PM - POLLER: Poller[0] Maximum runtime of 296 seconds exceeded. Exiting.

Any idea???
santiagosoares
Posts: 14
Joined: Fri Jun 23, 2006 10:46 am

Post by santiagosoares »

Anyone?
User avatar
rony
Developer/Forum Admin
Posts: 6022
Joined: Mon Nov 17, 2003 6:35 pm
Location: Michigan, USA
Contact:

Post by rony »

glib version on the box?

I am currently working on a box with simular issues, but it is running Centos 4.3. Everything at this point is telling me it's a mysql issue.
[size=117][i][b]Tony Roman[/b][/i][/size]
[size=84][i]Experience is what causes a person to make new mistakes instead of old ones.[/i][/size]
[size=84][i]There are only 3 way to complete a project: Good, Fast or Cheap, pick two.[/i][/size]
[size=84][i]With age comes wisdom, what you choose to do with it determines whether or not you are wise.[/i][/size]
santiagosoares
Posts: 14
Joined: Fri Jun 23, 2006 10:46 am

Post by santiagosoares »

glib version 2.6.4
Please, let me know if you found something about this issue.
I'm quite desperate here... :(
User avatar
rony
Developer/Forum Admin
Posts: 6022
Joined: Mon Nov 17, 2003 6:35 pm
Location: Michigan, USA
Contact:

Post by rony »

What version of GCC?
[size=117][i][b]Tony Roman[/b][/i][/size]
[size=84][i]Experience is what causes a person to make new mistakes instead of old ones.[/i][/size]
[size=84][i]There are only 3 way to complete a project: Good, Fast or Cheap, pick two.[/i][/size]
[size=84][i]With age comes wisdom, what you choose to do with it determines whether or not you are wise.[/i][/size]
santiagosoares
Posts: 14
Joined: Fri Jun 23, 2006 10:46 am

Post by santiagosoares »

gcc 3.3.5
f0llia
Posts: 21
Joined: Fri Apr 22, 2005 4:06 am

Post by f0llia »

hi all, i've the same problem, cacti stops to collect data and graphs from some random hosts..
i've installed it on this linux gentoo box:

Code: Select all

Portage 2.1-r1 (default-linux/x86/2006.0, gcc-3.4.5, glibc-2.3.6-r3, 2.6.16-gentoo-r7 i686)
=================================================================
System uname: 2.6.16-gentoo-r7 i686 Intel(R) Celeron(R) CPU 2.80GHz
Gentoo Base System version 1.6.14
dev-lang/python:     2.4.3-r1
dev-python/pycrypto: 2.0.1-r5
dev-util/ccache:     [Not Present]
dev-util/confcache:  [Not Present]
sys-apps/sandbox:    1.2.17
sys-devel/autoconf:  2.13, 2.59-r7
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r1
sys-devel/binutils:  2.16.1
sys-devel/gcc-config: 1.3.13-r2
sys-devel/libtool:   1.5.22
virtual/os-headers:  2.6.11-r2
and cacti's version is:

Code: Select all

 net-analyzer/cacti
      Latest version available: 0.8.6h_p20060108-r2
      Latest version installed: 0.8.6h_p20060108-r2
      Size of files: 1,080 kB
      Homepage:      http://www.cacti.net/
      Description:   Cacti is a complete frontend to rrdtool
      License:       GPL-2
how can fix ?

Thanks all
santiagosoares
Posts: 14
Joined: Fri Jun 23, 2006 10:46 am

Post by santiagosoares »

I've found something that I think can be interesting.
When I execute a

Code: Select all

#ps aux
during the polling, a get these lines about Cacti:

Code: Select all

cacti    23846  0.0  0.0  2768 1216 ?        Ss   17:40   0:00 /bin/sh -c   php /usr/share/cacti/poller.php > /dev/null 2>&1
cacti    23847  4.9  0.4 17740 9220 ?        S    17:40   0:00 php /usr/share/cacti/poller.php
cacti    23850  1.3  0.3 53404 7844 ?        S    17:40   0:00 /usr/share/cactid/cactid 0 1346
cacti    23851  0.9  0.0  4208 1096 ?        S    17:40   0:00 /usr/bin/rrdtool -
In the exact moment the poller stops collecting, i get this:

Code: Select all

cacti    23846  0.0  0.0  2768 1216 ?        Ss   17:40   0:00 /bin/sh -c   php /usr/share/cacti/poller.php > /dev/null 2>&1
cacti    23847  1.5  0.4 17740 9220 ?        S    17:40   0:01 php /usr/share/cacti/poller.php
cacti    23851  0.2  0.0  4208 1096 ?        S    17:40   0:00 /usr/bin/rrdtool -
No cactid Process!!!
Is it normal, or is it a clue to help to solve my problem?
User avatar
rony
Developer/Forum Admin
Posts: 6022
Joined: Mon Nov 17, 2003 6:35 pm
Location: Michigan, USA
Contact:

Post by rony »

Cactid is dieing... Do you have any "core" files laying around the the polling users home directory? Or in the cactid directory?

Also, how many hosts do you have? If you have more than 30, I would suggest at least 2 processes in the poller settings.
[size=117][i][b]Tony Roman[/b][/i][/size]
[size=84][i]Experience is what causes a person to make new mistakes instead of old ones.[/i][/size]
[size=84][i]There are only 3 way to complete a project: Good, Fast or Cheap, pick two.[/i][/size]
[size=84][i]With age comes wisdom, what you choose to do with it determines whether or not you are wise.[/i][/size]
Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest