Issue :
This entry fills the logfile "WARNING: SNMP timeout detected [500 ms], ignoring host 'servername'. This entry is printed for 99% of the 1600+ hosts in this specific cacti instance during EACH polling cycle. Data still gets collected, graphs keep graphing. Systems are online and most are under normal load conditions. The network is fine.
However, there are a very few subset of hosts which the graphs look like this:
Each host has a 99% SNMP availability uptime, and can be instantly polled via SNMP either by hand, or is monitored by nagios (and they never alert) - in short the hosts are fine. Other graphs which rely on non-snmp data such as local scripts never, ever have an issue.
This still happens no matter what setting is changed in cacti, be it :
- SNMP Timeout
- SNMP version
- SNMP Retries
- Host availability settings
- Maximum SNMP OID's Per SNMP Get Request
or the SNMP Timeout device specific settings.
Cacti environment details :
Code: Select all
Spine Version 0.8.7e
mysql-5.0.77-3.el5
Technical Support
General Information
Date Thu, 24 Feb 2011 09:59:01 -0500
Cacti Version 0.8.7e
Cacti OS unix
SNMP Version NET-SNMP version: 5.3.2.2
RRDTool Version RRDTool 1.3.x
Hosts 1715
Graphs 31207
Data Sources Script/Command: 1996
SNMP: 36833
SNMP Query: 21836
Script Query: 158
Script - Script Server (PHP): 5
Total: 60828
Poller Information
Interval 300
Type spine
Items Action[0]: 75688
Action[1]: 2534
Action[2]: 5
Total: 78227
Concurrent Processes 1
Max Threads 35
PHP Servers 10
Script Timeout 25
Max OID 5
Last Run Statistics Time:190.3967 Method:spine Processes:1 Threads:35 Hosts:1716 HostsPerProcess:1716 DataSources:78227 RRDsProcessed:51782
PHP Information
PHP Version 5.1.6
PHP OS Linux
PHP uname Linux graph1 2.6.18-128.el5 #1 SMP Wed Jan 21 10:41:14 EST 2009 x86_64
PHP SNMP Installed
max_execution_time 30
memory_limit 800M
In our development cacti environment I've replicated our prod install (pictured above) to a VM with all but one host disabled. When spine runs its polling cycle, you still get the same logfile entry "WARNING: SNMP timeout detected [500 ms], ignoring host 'servername'. However, as stated above everything is fine and continues to graph.
Also, I've upgraded our development environment to the latest spine, cacti with all the patches for both. This continues to happen. Even with 0 load on the cacti server, with one other idle test host in the entire install. It still displays the timeout warnings.
So, to recap :
1) Whats the deal with these warning messages?
2) Why are some hosts, randomly, seem to be unavailable to spine, when they are clearly available?
Any thoughts? Thanks!