Spine SNMP timeout detected [500 ms], ignoring host

skol · Post by **skol** » Thu Feb 24, 2011 11:05 am

I've searched extensively, and I have only found posts asking about this warning message, but no clear solution or ideas outside the obvious "its your network, host, cacti server".

Issue :

This entry fills the logfile "WARNING: SNMP timeout detected [500 ms], ignoring host 'servername'. This entry is printed for 99% of the 1600+ hosts in this specific cacti instance during EACH polling cycle. Data still gets collected, graphs keep graphing. Systems are online and most are under normal load conditions. The network is fine.

However, there are a very few subset of hosts which the graphs look like this:

: graph_image.php.png (29.43 KiB) Viewed 11010 times

Each host has a 99% SNMP availability uptime, and can be instantly polled via SNMP either by hand, or is monitored by nagios (and they never alert) - in short the hosts are fine. Other graphs which rely on non-snmp data such as local scripts never, ever have an issue.

This still happens no matter what setting is changed in cacti, be it :

- SNMP Timeout
- SNMP version
- SNMP Retries
- Host availability settings
- Maximum SNMP OID's Per SNMP Get Request

or the SNMP Timeout device specific settings.

Cacti environment details :

Code: Select all

Spine Version 0.8.7e
mysql-5.0.77-3.el5

Technical Support
General Information
Date 	Thu, 24 Feb 2011 09:59:01 -0500
Cacti Version 	0.8.7e
Cacti OS 	unix
SNMP Version 	NET-SNMP version: 5.3.2.2
RRDTool Version 	RRDTool 1.3.x
Hosts 	1715
Graphs 	31207
Data Sources 	Script/Command: 1996
SNMP: 36833
SNMP Query: 21836
Script Query: 158
Script - Script Server (PHP): 5
Total: 60828
Poller Information
Interval 	300
Type 	spine
Items 	Action[0]: 75688
Action[1]: 2534
Action[2]: 5
Total: 78227
Concurrent Processes 	1
Max Threads 	35
PHP Servers 	10
Script Timeout 	25
Max OID 	5
Last Run Statistics 	Time:190.3967 Method:spine Processes:1 Threads:35 Hosts:1716 HostsPerProcess:1716 DataSources:78227 RRDsProcessed:51782
PHP Information
PHP Version 	5.1.6
PHP OS 	Linux
PHP uname 	Linux graph1 2.6.18-128.el5 #1 SMP Wed Jan 21 10:41:14 EST 2009 x86_64
PHP SNMP 	Installed
max_execution_time 	30
memory_limit 	800M

spine-0.8.7d did not exhibit this issue. When we upgraded to 0.8.7e this immediately started happening. This problem does not happen at all when you switch to the cmd.php poller (however, we can't use the cmd.php poller on this instance, it takes too long).

In our development cacti environment I've replicated our prod install (pictured above) to a VM with all but one host disabled. When spine runs its polling cycle, you still get the same logfile entry "WARNING: SNMP timeout detected [500 ms], ignoring host 'servername'. However, as stated above everything is fine and continues to graph.

Also, I've upgraded our development environment to the latest spine, cacti with all the patches for both. This continues to happen. Even with 0 load on the cacti server, with one other idle test host in the entire install. It still displays the timeout warnings.

So, to recap :

1) Whats the deal with these warning messages?
2) Why are some hosts, randomly, seem to be unavailable to spine, when they are clearly available?

Any thoughts? Thanks!

radiumfu · Post by **radiumfu** » Thu Feb 24, 2011 2:59 pm

I got same problem.

General Information
Date Thu, 24 Feb 2011 11:20:29 -0800
Cacti Version 0.8.7g
Cacti OS unix
SNMP Version NET-SNMP version: 5.1.2
RRDTool Version RRDTool 1.2.x
Hosts 39
Graphs 331
Data Sources Script/Command: 24
SNMP: 73
SNMP Query: 237
Script Query: 2
Script - Script Server (PHP): 1
Total: 337
Poller Information
Interval 60
Type spine
Items Action[0]: 560
Action[1]: 28
Action[2]: 1
Total: 589
Concurrent Processes 1
Max Threads 1
PHP Servers 1
Script Timeout 25
Max OID 10
Last Run Statistics Time:6.3016 Method:spine Processes:1 Threads:1 Hosts:40 HostsPerProcess:40 DataSources:454 RRDsProcessed:237
PHP Information
PHP Version 5.1.6
PHP OS Linux
PHP uname Linux s7itm03 2.6.9-89.0.16.plus.c4smp #1 SMP Tue Nov 3 18:15:39 EST 2009 i686
PHP SNMP Installed
max_execution_time 30
memory_limit 128M

Cacti Version - 0.8.7g
Plugin Architecture - 2.8
Poller Type - Cactid v
Server Info - Linux 2.6.9-89.0.16.plus.c4smp
Web Server - Apache/2.0.63 (CentOS)
PHP - 5.1.6
PHP Extensions - libxml, xml, wddx, tokenizer, sysvshm, sysvsem, sysvmsg, standard, SimpleXML, sockets, SPL, shmop, session, Reflection, pspell, posix, mime_magic, iconv, hash, gmp, gettext, ftp, exif, date, curl, ctype, calendar, bz2, zlib, pcre, openssl, apache2handler, gd, ldap, mysql, mysqli, PDO, pdo_mysql, pdo_sqlite, snmp, eAccelerator
MySQL - 5.0.82sp1
RRDTool - 1.2.23
SNMP - 5.1.2
Plugins

Realtime for Cacti (realtime - v0.35)
Host Info (hostinfo - v0.2)
Device Tracking (mactrack - v1.1)
Global Plugin Settings (settings - v0.5)
Device Monitoring (monitor - v0.8.2)
PHP Network Weathermap (weathermap - v0.97a)
Update Checker (update - v0.4)

===============================
I find the issue when I upgrade my spine from 0.87e to 0.87g. strange things is only 3 hosts(attached 2 host picture) has the issue. most hosts are normal. and all these 3 hosts are cisco 3845 router(but 1 other 3845 is ok, not all 3845 has issue) , all of them running bgp. For hese 3845, I also capture cpu/memory utilization, different is these data source been poll each 5 minut, they are complete stop work on spine 0.87g.
last you can see, once I roll back to 0.87e, the issue gone. most likely its becase some thing change in new spine version.

and this is how I upgraded spine:

Code: Select all

wget http://www.cacti.net/downloads/spine/cacti-spine-0.8.7g.tar.gz
tar zxf cacti-spine-0.8.7g.tar.gz 
wget http://www.cacti.net/downloads/spine/patches/0.8.7g/unified_issues.patch
patch -p1 -N < unified_issues.patch
./configure --with-results-buffer=2048
make
make install
cd /usr/local/spine/bin/
cp spine.conf.dist spine.conf
vi spine.conf

then I upgrade my cacti plugin architecture "cacti-plugin-0.8.7g-PA-v2.8.tar.gz", as I just upgrade my cacti from ver 0.87c -> 0.87g

blueman176 · Post by **blueman176** » Mon Feb 28, 2011 5:20 am

Same problem too!

Post by **gandalf** » Thu Mar 03, 2011 3:17 pm

Please try latest spine from SVN. We had some fixes in that direction lately.
R.

LC's Disciple · Post by **LC's Disciple** » Wed Mar 09, 2011 7:45 am

Hello,

I'm experiencing same problems but it mainly only effects mysql database hosts.
I have a long response time when looking in "Devices" and the graphs are cluttered.

I run three Cacti sites and on all I use Spine 0.8.7e and same version of PHP.
Two of the sites are having problem with database hosts (and others sometime)
but one site works fine with very good response time. The only different, besides
the geographical bit and from what I've found relevant, is the SNMP version.

The two bad sites have: NET-SNMP 5.3.1
The good site have: NET-SNMP 5.3.2.2

I don't know if this is the actual problem and I can't easily update en test due to the processes
regarding such changes. But it will be updated sooner or later.

Question reamins if this makes sense and if anyone else have had a similar problem and solved it
by updating the SNMP version? Or any other ideas on this problem?
Did anyone update the Spine version as gandalf proclaimed? How did it go?

Cheers!

skol · Post by **skol** » Wed Mar 09, 2011 1:17 pm

LC's Disciple wrote:Hello,

I'm experiencing same problems but it mainly only effects mysql database hosts.
I have a long response time when looking in "Devices" and the graphs are cluttered.

Not sure what your problem is, from that description. What does a graph being cluttered have to do with SNMP versions?

If it is hard for you to upgrade net-snmp on a system then you've got bigger issues. Remember you'll need to rebuild spine after upgrading.

We're using 5.3.2.2 as described above.

LC's Disciple · Post by **LC's Disciple** » Thu Mar 10, 2011 8:09 am

Not sure what your problem is, from that description. What does a graph being cluttered have to do with SNMP versions?

That's what I wanted to know. Dose it have an connection. But you run a later version NET-SNMP so maybe it's not that. It's being
cluttered because of the high response time to the host. Many graphs uses SNMP to retrieve the data but not all and all graphs are crap
so I was simply looking for any feedback on the SNMP thing. Are you having high response time to using SNMP as "Downed Device Detection"
method?

If it is hard for you to upgrade net-snmp on a system then you've got bigger issues. Remember you'll need to rebuild spine after upgrading.

It is not hard it is inconvenient

Thanks for reply and good luck with solving your problem

skol · Post by **skol** » Thu Mar 10, 2011 8:47 am

LC's Disciple wrote: That's what I wanted to know. Dose it have an connection. But you run a later version NET-SNMP so maybe it's not that. It's being
cluttered because of the high response time to the host. Many graphs uses SNMP to retrieve the data but not all and all graphs are crap
so I was simply looking for any feedback on the SNMP thing. Are you having high response time to using SNMP as "Downed Device Detection"
method?

No, we aren't having that issue.

skol · Post by **skol** » Fri Mar 11, 2011 10:54 am

Using spine from SVN did not solve the issue.

kyosanim · Post by **kyosanim** » Thu Apr 21, 2011 7:14 am

hey guys,
i have the same issue with my new 0.8.7g installation.

Cacti Version 0.8.7g
Cacti OS unix
SNMP Version NET-SNMP version: 5.4.2.1
RRDTool Version RRDTool 1.3.x
SPINE 0.8.7g

cmd.php works fine. With spine i get :
04/21/2011 01:59:10 PM - SPINE: Poller[0] Host[3] TH[1] DS[65] WARNING: SNMP timeout detected [1000 ms], ignoring host 'x.x.x.x'

After some trial and error i think the issue is related to snmpbulkwalk in combination with spine and some hosts (not all hosts). After reducing the number of "Maximum OID's Per Get Request" in the device settings the error has gone.
but this massiv slows down the pollingtime.

Can anyone confim this behavior?

Regards,
Markus

skol · Post by **skol** » Thu Apr 21, 2011 9:07 pm

Turns out it was a script which was polling items that was taking SECONDS to return results. I fixed the script to return results in milliseconds which solved the issue.

jehan.procaccia · Post by **jehan.procaccia** » Mon Jul 11, 2011 9:45 am

I got the same problem

I did downgrade from spine-0.8.7g to spine-0.8.7e without success.
I adjusted "The Maximum SNMP OID's Per SNMP Get Request" to many values (from 20 to 0/1) without success .
unlike skol I don't have scripts (I guess...i't for a cisco 6500 with many interfaces counters)
What else can I do ?

Here's a run on that host (ID = 4) , indeed that are many DS
There are '643' Polling Items for this Host
but it worked fine before ....

$ /usr/bin/spine --verbosity=5 4 4
SPINE: Using spine config file [/etc/spine.conf]
WARNING: Unrecongized directive: DB_PreG=0 in /etc/spine.conf
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The path_php_server variable is /usr/share/cacti/script_server.php
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The path_cactilog variable is /var/log/cacti/cacti.log
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The log_destination variable is 1 (FILE)
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The path_php variable is /usr/bin/php
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The availability_method variable is 2
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The ping_recovery_count variable is 1
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The ping_failure_count variable is 2
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The ping_method variable is 2
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The ping_retries variable is 1
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The ping_timeout variable is 400
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The snmp_retries variable is 3
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The log_perror variable is 0
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The log_pwarn variable is 0
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The boost_redirect variable is 0
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The log_pstats variable is 0
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The threads variable is 50
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The polling interval is 300 seconds
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The number of concurrent processes is 1
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The script timeout is 25
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The number of php script servers to run is 10
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: StartHost='4', EndHost='4', TotalPHPScripts='0'
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The PHP Script Server is Not Required
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The Maximum SNMP OID Get Size is 8
07/11/2011 03:58:26 PM - SPINE: Poller[0] Version 0.8.7e starting
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: MySQL is Thread Safe!
07/11/2011 03:58:26 PM - SPINE: Poller[0] SPINE: Initializing Net-SNMP API
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: SNMP Header Version is 5.3.2.2
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: SNMP Library Version is 5.3.2.2
07/11/2011 03:58:26 PM - SPINE: Poller[0] SPINE: Initializing PHP Script Server(s)
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: Initial Value of Active Threads is 0
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: Valid Thread to be Created
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 1
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: In Poller, About to Start Polling of Host
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: Valid Thread to be Created
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 2
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: In Poller, About to Start Polling of Host
07/11/2011 03:58:26 PM - SPINE: Poller[0] Host[0] DEBUG: HOST COMPLETE: About to Exit Host Polling Thread Function
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 1
07/11/2011 03:58:26 PM - SPINE: Poller[0] Host[4] DEBUG: ICMP Host Alive, Try Count:1, Time:136.9909 ms
07/11/2011 03:58:26 PM - SPINE: Poller[0] Host[4] PING Result: ICMP: Host is Alive
07/11/2011 03:58:26 PM - SPINE: Poller[0] Host[4] SNMP Result: SNMP not performed due to setting or ping result
07/11/2011 03:58:26 PM - SPINE: Poller[0] Host[4] RECACHE: Processing 1 items in the auto reindex cache for '157.159.8.2'
07/11/2011 03:58:27 PM - SPINE: Poller[0] Host[4] NOTE: There are '643' Polling Items for this Host
07/11/2011 03:58:28 PM - SPINE: Poller[0] Host[4] DS[23] SNMP: v2: 157.159.8.2, dsname: traffic_in, oid: .1.3.6.1.2.1.2.2.1.10.1, value: 3462470578
...
07/11/2011 03:58:39 PM - SPINE: Poller[0] Host[4] DS[147] SNMP: v2: 157.159.8.2, dsname: traffic_out, oid: .1.3.6.1.2.1.2.2.1.16.127, value: 3852501608
07/11/2011 03:58:45 PM - SPINE: Poller[0] Host[4] DS[148] WARNING: SNMP timeout detected [1500 ms], ignoring host '157.159.8.2'
07/11/2011 03:58:45 PM - SPINE: Poller[0] Host[4] DS[148] SNMP: v2: 157.159.8.2, dsname: traffic_out, oid: .1.3.6.1.2.1.2.2.1.16.128, value: U
...
7/11/2011 03:58:51 PM - SPINE: Poller[0] Host[4] DS[154] SNMP: v2: 157.159.8.2, dsname: traffic_out, oid: .1.3.6.1.2.1.2.2.1.16.134, value: U
07/11/2011 03:58:51 PM - SPINE: Poller[0] Host[4] DEBUG: HOST COMPLETE: About to Exit Host Polling Thread Function
07/11/2011 03:58:51 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 0
07/11/2011 03:58:51 PM - SPINE: Poller[0] DEBUG: Thread Cleanup Complete
07/11/2011 03:58:51 PM - SPINE: Poller[0] DEBUG: PHP Script Server Pipes Closed
07/11/2011 03:58:51 PM - SPINE: Poller[0] DEBUG: Allocated Variable Memory Freed
07/11/2011 03:58:51 PM - SPINE: Poller[0] DEBUG: MYSQL Free & Close Completed
07/11/2011 03:58:51 PM - SPINE: Poller[0] Time: 24.8853 s, Threads: 50, Hosts: 2

Thanks for any help .

jehan.procaccia · Post by **jehan.procaccia** » Mon Jul 11, 2011 12:04 pm

Forget it ...
it turns out that my switch (c6500) was really out performing !
CPU was at 98% due to a Xcast flood , so snmp was really timing out ...
this time cacti indirectly helped me to solve a switch overload

now that the flood is over , cacti + spine works fine

DDJ · Post by **DDJ** » Sat Jul 20, 2013 10:34 pm

Ok guys, had the same problem here, but fixed with the right config at data template for my graph, in the "data source item" section, I put "minimum value = 0" and "maximum value = U", based on this guide http://www.cacti.net/downloads/docs/htm ... mbers.html

So now "WARNING: SNMP timeout detected [500 ms], ignoring host 'x.x.x.x'" is gone from cacti.log, I hope this can help some of you.

cacti 0.8.8a here with Wheezy 7.1

NitrousOxyde · Post by **NitrousOxyde** » Mon Aug 07, 2017 1:01 am

Hi comrades,

I've got the same problem - some of RRDs just don't get updated each time a poller runs(spine in my case)

Tried to delete and re-add problematic hosts, but then another ones stop being updated. File ownership is the same on all files - both updating and non-updating RRDs (root:root), because poller.php runs from root. File mode is also 644 for all files - both updating and non-updating

I've tried all possible methods, but it doesnt help. One property, which is the same for all hosts is that all of them are DMVPN spokes and connect to cacti server "behind" hub router via IPSec tunnel. Interestingly, routers "behind" hub router(i.e, working via direct/indirect ethernet) do not fail in such a manner, as spoke routers do

Cacti

Spine SNMP timeout detected [500 ms], ignoring host

Spine SNMP timeout detected [500 ms], ignoring host

Re: Spine SNMP timeout detected [500 ms], ignoring host

Re: Spine SNMP timeout detected [500 ms], ignoring host

Re: Spine SNMP timeout detected [500 ms], ignoring host

Re: Spine SNMP timeout detected [500 ms], ignoring host

Re: Spine SNMP timeout detected [500 ms], ignoring host

Re: Spine SNMP timeout detected [500 ms], ignoring host

Re: Spine SNMP timeout detected [500 ms], ignoring host

Re: Spine SNMP timeout detected [500 ms], ignoring host

Re: Spine SNMP timeout detected [500 ms], ignoring host

Re: Spine SNMP timeout detected [500 ms], ignoring host

Re: Spine SNMP timeout detected [500 ms], ignoring host

Re: Spine SNMP timeout detected [500 ms], ignoring host

Re: Spine SNMP timeout detected [500 ms], ignoring host

Re: Spine SNMP timeout detected [500 ms], ignoring host

Who is online