Graphs stop - RHEL3 + 0.8.6g + cactid 0.8.6f

Post support questions that directly relate to Linux/Unix operating systems.

Moderators: Developers, Moderators

Post Reply
thickas
Posts: 4
Joined: Wed Nov 30, 2005 11:53 pm

Graphs stop - RHEL3 + 0.8.6g + cactid 0.8.6f

Post by thickas »

Dear Folks,

Any advice about this persistent problem with Cacti/Cactid is most welcome.

Cacti (0.8.6g) + Cactid (0.8.6f) were both installed from source RPM on the problem RHEL3 host (an unstressed dual Opteron HP 385).

Graphs work fine _for a while_ and then stop. When this happens, the RRAs appear to be updated (the ls -l shows a time in the last 5 mins), although there is no data going in. Here's an example,

[sh1517@acisf011 rra]$ rrdtool fetch armidale_optus_ce_interface_traffic_in_307.rrd AVERAGE -s -2h | perl -ne 's/^(\d+): (.*)/localtime($1) . " $2"/e; print '
traffic_in traffic_out

Thu Dec 1 14:25:00 2005 0.0000000000e+00 0.0000000000e+00
Thu Dec 1 14:30:00 2005 0.0000000000e+00 0.0000000000e+00
Thu Dec 1 14:35:00 2005 0.0000000000e+00 0.0000000000e+00
Thu Dec 1 14:40:00 2005 0.0000000000e+00 0.0000000000e+00
Thu Dec 1 14:45:00 2005 0.0000000000e+00 0.0000000000e+00
Thu Dec 1 14:50:00 2005 0.0000000000e+00 0.0000000000e+00
Thu Dec 1 14:55:00 2005 0.0000000000e+00 0.0000000000e+00
Thu Dec 1 15:00:00 2005 0.0000000000e+00 0.0000000000e+00
Thu Dec 1 15:05:00 2005 0.0000000000e+00 0.0000000000e+00
Thu Dec 1 15:10:00 2005 0.0000000000e+00 0.0000000000e+00
Thu Dec 1 15:15:00 2005 0.0000000000e+00 0.0000000000e+00
Thu Dec 1 15:20:00 2005 0.0000000000e+00 0.0000000000e+00
Thu Dec 1 15:25:00 2005 0.0000000000e+00 0.0000000000e+00
Thu Dec 1 15:30:00 2005 0.0000000000e+00 0.0000000000e+00
Thu Dec 1 15:35:00 2005 0.0000000000e+00 0.0000000000e+00
Thu Dec 1 15:40:00 2005 0.0000000000e+00 0.0000000000e+00
Thu Dec 1 15:45:00 2005 0.0000000000e+00 0.0000000000e+00
Thu Dec 1 15:50:00 2005 0.0000000000e+00 0.0000000000e+00
Thu Dec 1 15:55:00 2005 0.0000000000e+00 0.0000000000e+00
Thu Dec 1 16:00:00 2005 0.0000000000e+00 0.0000000000e+00
Thu Dec 1 16:05:00 2005 0.0000000000e+00 0.0000000000e+00
Thu Dec 1 16:10:00 2005 0.0000000000e+00 0.0000000000e+00
Thu Dec 1 16:15:00 2005 0.0000000000e+00 0.0000000000e+00
Thu Dec 1 16:20:00 2005 0.0000000000e+00 0.0000000000e+00
Thu Dec 1 16:25:00 2005 nan nan

[sh1517@acisf011 rra]$ ls -l armidale_optus_ce_interface_traffic_in_307.rrd
-rw-r--r-- 1 cactiuser cactiuser 94664 Dec 1 16:20 armidale_optus_ce_interface_traffic_in_307.rrd
[sh1517@acisf011 rra]$

Whether or not the RRD _is_ updated, the cacti log consistently shows

12/01/2005 04:10:03 PM - CACTID: Poller[0] Host[17] DS[211] WARNING: Result from SNMP not valid. Partial Result: ...
12/01/2005 04:10:04 PM - CACTID: Poller[0] Host[33] DS[697] WARNING: Result from SNMP not valid. Partial Result: ...
12/01/2005 04:10:06 PM - CACTID: Poller[0] Host[102] DS[7200] WARNING: Result from SNMP not valid. Partial Result: ...
12/01/2005 04:10:08 PM - SYSTEM STATS: Time:5.4212 Method:cactid Processes:1 Threads:50 Hosts:64 HostsPerProcess:64 DataSources:2131 RRDsProcessed:1177

12/01/2005 04:15:05 PM - CACTID: Poller[0] Host[17] DS[211] WARNING: Result from SNMP not valid. Partial Result: ...
12/01/2005 04:15:06 PM - CACTID: Poller[0] Host[33] DS[697] WARNING: Result from SNMP not valid. Partial Result: ...
12/01/2005 04:15:08 PM - CACTID: Poller[0] Host[102] DS[7200] WARNING: Result from SNMP not valid. Partial Result: ...
12/01/2005 04:15:09 PM - SYSTEM STATS: Time:5.4281 Method:cactid Processes:1 Threads:50 Hosts:64 HostsPerProcess:64 DataSources:2131 RRDsProcessed:1177

12/01/2005 04:20:04 PM - CACTID: Poller[0] Host[17] DS[211] WARNING: Result from SNMP not valid. Partial Result: ...
12/01/2005 04:20:04 PM - CACTID: Poller[0] Host[33] DS[697] WARNING: Result from SNMP not valid. Partial Result: ...
12/01/2005 04:20:06 PM - CACTID: Poller[0] Host[102] DS[7200] WARNING: Result from SNMP not valid. Partial Result: ...
12/01/2005 04:20:08 PM - SYSTEM STATS: Time:5.4080 Method:cactid Processes:1 Threads:50 Hosts:64 HostsPerProcess:64 DataSources:2131 RRDsProcessed:1177
[sh1517@acisf011 log]$

ie some warnings about partial results.

If the device corresponding to the graph is deleted and then re-added, all is well .. for a few days and then another graph stops.

The only conclusion I can make is that this _is_ a load related problem. There is a threshold of datasources/hosts/RRDs below which all is well but above which the graphing becomes erratic.

Does this sound like other folks experience ?

If so, what can be done about it ?

I see a thread about multiple poller support in the CVS and scheduled for 0.0.0 +- 1 release.

Is this still current ?

Is it possible to have multiple cacti instances on one host with _multiple_ hacked cactid pollers as of 0.8.6[fg] ? I think it would be necessary to have multiple crontabs and multiple poller config files (changing the config file processing in the cactid source). Is this feasable ?

Thank you,
thickas
Posts: 4
Joined: Wed Nov 30, 2005 11:53 pm

Post by thickas »

After enabling DEBUG in cacti, the poller shows that its updating the RRD with what I am seeing, zero

12/02/2005 11:05:03 AM - POLLER: Poller[0] CACTI2RRD: /usr/bin/rrdtool update /var/www/html/cacti/rra/armidale_optus_ce_interface_traffic_in_307.rrd --template traffic_in:traffic_out 1133481902:0:0

12/02/2005 11:10:05 AM - POLLER: Poller[0] CACTI2RRD: /usr/bin/rrdtool update /var/www/html/cacti/rra/armidale_optus_ce_interface_traffic_in_307.rrd --template traffic_in:traffic_out 1133482204:0:0

12/02/2005 11:15:05 AM - POLLER: Poller[0] CACTI2RRD: /usr/bin/rrdtool update /var/www/html/cacti/rra/armidale_optus_ce_interface_traffic_in_307.rrd --template traffic_in:traffic_out 1133482504:0:0

I have tried increasing the mem allocation in php.ini but without any help.

Any suggestions are very welcome.
thickas
Posts: 4
Joined: Wed Nov 30, 2005 11:53 pm

Post by thickas »

It seems to me that the poller is not working too well. As you can see from the log the Poller for the problem host 192.168.1.54 consistently returns zero. A manual snmget returns a non-zero value.

12/02/2005 10:55:04 AM - CACTID: Poller[0] Host[7] DS[307] SNMP: v1: 192.168.1.54, dsname: traffic_out, oid: .1.3.6.1.2.1.2.2.1.16.3
, value: 0

12/02/2005 11:00:05 AM - CACTID: Poller[0] Host[7] DS[307] SNMP: v1: 192.168.1.54, dsname: traffic_in, oid: .1.3.6.1.2.1.2.2.1.10.3,
value: 0

12/02/2005 11:00:05 AM - CACTID: Poller[0] Host[7] DS[307] SNMP: v1: 192.168.1.54, dsname: traffic_out, oid: .1.3.6.1.2.1.2.2.1.16.3
, value: 0

12/02/2005 11:05:03 AM - CACTID: Poller[0] Host[7] DS[307] SNMP: v1: 192.168.1.54, dsname: traffic_in, oid: .1.3.6.1.2.1.2.2.1.10.3,
value: 0

12/02/2005 11:05:03 AM - CACTID: Poller[0] Host[7] DS[307] SNMP: v1: 192.168.1.54, dsname: traffic_out, oid: .1.3.6.1.2.1.2.2.1.16.3
, value: 0

12/02/2005 11:10:05 AM - CACTID: Poller[0] Host[7] DS[307] SNMP: v1: 192.168.1.54, dsname: traffic_in, oid: .1.3.6.1.2.1.2.2.1.10.3,
value: 0

[sh1517@acisf011 log]$ snmpget -v 1 -c foo 192.168.1.54 .1.3.6.1.2.1.2.2.1.10.3
IF-MIB::ifInOctets.3 = Counter32: 134909671

[sh1517@acisf011 log]$ snmpget -v 1 -c foo 192.168.1.54 .1.3.6.1.2.1.2.2.1.16.3
IF-MIB::ifOutOctets.3 = Counter32: 97969282
[sh1517@acisf011 log]$


Once again, any clues are most welcome.
flundy
Posts: 13
Joined: Wed Oct 13, 2004 3:38 pm

Post by flundy »

I am using .086f cactid poller and I have this problem, but only w/ the load avg polls. Everything else (cpu, network, site-local scripts) works.

Are there patches to the .86f cactid that need to be applied. I d/l'ed it the last week of November, and assumed it was current... Any help would be appreciated.
thickas
Posts: 4
Joined: Wed Nov 30, 2005 11:53 pm

Post by thickas »

To conclude for the benefit of the archives, I have forsaken cactid and started using php-snmp (cmd.php in the settings/poller page).

The PHP poller appears to

1 poll everything

2 do so in a time that is only 2-4 times worse than cactid (for 1.1k RRDs ~ 20 sec instead of 5-10 sec with cactid).


[root@acisf011 log]# tail -f cacti.log
Method:cmd.php Processes:50 Threads:N/A Hosts:64 HostsPerProcess:2 DataSources:2131 RRDsProcessed:1177
12/05/2005 02:00:15 PM - CMDPHP: Poller[0] Host[7] DS[330] WARNING: Result from SNMP not valid. Partial Result:
12/05/2005 02:00:15 PM - CMDPHP: Poller[0] Host[7] DS[330] WARNING: Result from SNMP not valid. Partial Result:
12/05/2005 02:00:16 PM - CMDPHP: Poller[0] Host[71] DS[7347] WARNING: Result from SNMP not valid. Partial Result:
12/05/2005 02:00:17 PM - CMDPHP: Poller[0] Host[17] DS[212] WARNING: Result from SNMP not valid. Partial Result:
12/05/2005 02:00:17 PM - CMDPHP: Poller[0] Host[17] DS[212] WARNING: Result from SNMP not valid. Partial Result:
12/05/2005 02:00:18 PM - CMDPHP: Poller[0] Host[33] DS[720] WARNING: Result from SNMP not valid. Partial Result:
12/05/2005 02:00:18 PM - CMDPHP: Poller[0] Host[33] DS[720] WARNING: Result from SNMP not valid. Partial Result:

12/05/2005 02:00:24 PM - SYSTEM STATS: Time:20.3044 Method:cmd.php Processes:50 Threads:N/A Hosts:64 HostsPerProcess:2 DataSources:2131 RRDsProcessed:1177


12/05/2005 02:05:16 PM - CMDPHP: Poller[0] Host[7] DS[330] WARNING: Result from SNMP not valid. Partial Result:
12/05/2005 02:05:16 PM - CMDPHP: Poller[0] Host[7] DS[330] WARNING: Result from SNMP not valid. Partial Result:
12/05/2005 02:05:16 PM - CMDPHP: Poller[0] Host[71] DS[7347] WARNING: Result from SNMP not valid. Partial Result:
12/05/2005 02:05:17 PM - CMDPHP: Poller[0] Host[17] DS[212] WARNING: Result from SNMP not valid. Partial Result:
12/05/2005 02:05:17 PM - CMDPHP: Poller[0] Host[17] DS[212] WARNING: Result from SNMP not valid. Partial Result:
12/05/2005 02:05:18 PM - CMDPHP: Poller[0] Host[33] DS[720] WARNING: Result from SNMP not valid. Partial Result:
12/05/2005 02:05:18 PM - CMDPHP: Poller[0] Host[33] DS[720] WARNING: Result from SNMP not valid. Partial Result:

12/05/2005 02:05:24 PM - SYSTEM STATS: Time:21.9656 Method:cmd.php Processes:50 Threads:N/A Hosts:64 HostsPerProcess:2 DataSources:2131 RRDsProcessed:1177

3 Graphs have all started (and all the applications that hang off the RRDs such as Weathermap4RRD).

It is too early to say for sure if the PHP poller will be reliable since cactid was known to run _fine_ for 1-5 days before the graphs would stop, but this is very encouraging.

Thanks very much for a great product. I am so pleased that it is working again.
User avatar
TheWitness
Developer
Posts: 17007
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

The new version of Cactid 0.8.6g-beta shall correct the issues with SNMP. It will be released this week likely. It is available under the Announcement forum for testing and feedback.

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
flundy
Posts: 13
Joined: Wed Oct 13, 2004 3:38 pm

Post by flundy »

Thanks again for all your efforts.
Post Reply

Who is online

Users browsing this forum: No registered users and 0 guests