Spine SNMP timeout detected [500 ms], ignoring host

Post support questions that directly relate to Linux/Unix operating systems.

Moderators: Developers, Moderators

Post Reply
skol
Posts: 41
Joined: Mon Nov 10, 2003 3:06 pm

Spine SNMP timeout detected [500 ms], ignoring host

Post by skol »

I've searched extensively, and I have only found posts asking about this warning message, but no clear solution or ideas outside the obvious "its your network, host, cacti server".

Issue :

This entry fills the logfile "WARNING: SNMP timeout detected [500 ms], ignoring host 'servername'. This entry is printed for 99% of the 1600+ hosts in this specific cacti instance during EACH polling cycle. Data still gets collected, graphs keep graphing. Systems are online and most are under normal load conditions. The network is fine.

However, there are a very few subset of hosts which the graphs look like this:
graph_image.php.png
graph_image.php.png (29.43 KiB) Viewed 11010 times
Each host has a 99% SNMP availability uptime, and can be instantly polled via SNMP either by hand, or is monitored by nagios (and they never alert) - in short the hosts are fine. Other graphs which rely on non-snmp data such as local scripts never, ever have an issue.

This still happens no matter what setting is changed in cacti, be it :

- SNMP Timeout
- SNMP version
- SNMP Retries
- Host availability settings
- Maximum SNMP OID's Per SNMP Get Request

or the SNMP Timeout device specific settings.

Cacti environment details :

Code: Select all

Spine Version 0.8.7e
mysql-5.0.77-3.el5

Technical Support
General Information
Date 	Thu, 24 Feb 2011 09:59:01 -0500
Cacti Version 	0.8.7e
Cacti OS 	unix
SNMP Version 	NET-SNMP version: 5.3.2.2
RRDTool Version 	RRDTool 1.3.x
Hosts 	1715
Graphs 	31207
Data Sources 	Script/Command: 1996
SNMP: 36833
SNMP Query: 21836
Script Query: 158
Script - Script Server (PHP): 5
Total: 60828
Poller Information
Interval 	300
Type 	spine
Items 	Action[0]: 75688
Action[1]: 2534
Action[2]: 5
Total: 78227
Concurrent Processes 	1
Max Threads 	35
PHP Servers 	10
Script Timeout 	25
Max OID 	5
Last Run Statistics 	Time:190.3967 Method:spine Processes:1 Threads:35 Hosts:1716 HostsPerProcess:1716 DataSources:78227 RRDsProcessed:51782
PHP Information
PHP Version 	5.1.6
PHP OS 	Linux
PHP uname 	Linux graph1 2.6.18-128.el5 #1 SMP Wed Jan 21 10:41:14 EST 2009 x86_64
PHP SNMP 	Installed
max_execution_time 	30
memory_limit 	800M
spine-0.8.7d did not exhibit this issue. When we upgraded to 0.8.7e this immediately started happening. This problem does not happen at all when you switch to the cmd.php poller (however, we can't use the cmd.php poller on this instance, it takes too long).

In our development cacti environment I've replicated our prod install (pictured above) to a VM with all but one host disabled. When spine runs its polling cycle, you still get the same logfile entry "WARNING: SNMP timeout detected [500 ms], ignoring host 'servername'. However, as stated above everything is fine and continues to graph.

Also, I've upgraded our development environment to the latest spine, cacti with all the patches for both. This continues to happen. Even with 0 load on the cacti server, with one other idle test host in the entire install. It still displays the timeout warnings.

So, to recap :

1) Whats the deal with these warning messages?
2) Why are some hosts, randomly, seem to be unavailable to spine, when they are clearly available?

Any thoughts? Thanks!
Last edited by skol on Thu Feb 24, 2011 3:51 pm, edited 1 time in total.
radiumfu
Posts: 8
Joined: Fri Jul 14, 2006 7:18 am
Location: Los Angeles

Re: Spine SNMP timeout detected [500 ms], ignoring host

Post by radiumfu »

I got same problem.

General Information
Date Thu, 24 Feb 2011 11:20:29 -0800
Cacti Version 0.8.7g
Cacti OS unix
SNMP Version NET-SNMP version: 5.1.2
RRDTool Version RRDTool 1.2.x
Hosts 39
Graphs 331
Data Sources Script/Command: 24
SNMP: 73
SNMP Query: 237
Script Query: 2
Script - Script Server (PHP): 1
Total: 337
Poller Information
Interval 60
Type spine
Items Action[0]: 560
Action[1]: 28
Action[2]: 1
Total: 589
Concurrent Processes 1
Max Threads 1
PHP Servers 1
Script Timeout 25
Max OID 10
Last Run Statistics Time:6.3016 Method:spine Processes:1 Threads:1 Hosts:40 HostsPerProcess:40 DataSources:454 RRDsProcessed:237
PHP Information
PHP Version 5.1.6
PHP OS Linux
PHP uname Linux s7itm03 2.6.9-89.0.16.plus.c4smp #1 SMP Tue Nov 3 18:15:39 EST 2009 i686
PHP SNMP Installed
max_execution_time 30
memory_limit 128M
Cacti Version - 0.8.7g
Plugin Architecture - 2.8
Poller Type - Cactid v
Server Info - Linux 2.6.9-89.0.16.plus.c4smp
Web Server - Apache/2.0.63 (CentOS)
PHP - 5.1.6
PHP Extensions - libxml, xml, wddx, tokenizer, sysvshm, sysvsem, sysvmsg, standard, SimpleXML, sockets, SPL, shmop, session, Reflection, pspell, posix, mime_magic, iconv, hash, gmp, gettext, ftp, exif, date, curl, ctype, calendar, bz2, zlib, pcre, openssl, apache2handler, gd, ldap, mysql, mysqli, PDO, pdo_mysql, pdo_sqlite, snmp, eAccelerator
MySQL - 5.0.82sp1
RRDTool - 1.2.23
SNMP - 5.1.2
Plugins
  • Realtime for Cacti (realtime - v0.35)
    Host Info (hostinfo - v0.2)
    Device Tracking (mactrack - v1.1)
    Global Plugin Settings (settings - v0.5)
    Device Monitoring (monitor - v0.8.2)
    PHP Network Weathermap (weathermap - v0.97a)
    Update Checker (update - v0.4)
===============================
I find the issue when I upgrade my spine from 0.87e to 0.87g. strange things is only 3 hosts(attached 2 host picture) has the issue. most hosts are normal. and all these 3 hosts are cisco 3845 router(but 1 other 3845 is ok, not all 3845 has issue) , all of them running bgp. For hese 3845, I also capture cpu/memory utilization, different is these data source been poll each 5 minut, they are complete stop work on spine 0.87g.
last you can see, once I roll back to 0.87e, the issue gone. most likely its becase some thing change in new spine version.

and this is how I upgraded spine:

Code: Select all

wget http://www.cacti.net/downloads/spine/cacti-spine-0.8.7g.tar.gz
tar zxf cacti-spine-0.8.7g.tar.gz 
wget http://www.cacti.net/downloads/spine/patches/0.8.7g/unified_issues.patch
patch -p1 -N < unified_issues.patch
./configure --with-results-buffer=2048
make
make install
cd /usr/local/spine/bin/
cp spine.conf.dist spine.conf
vi spine.conf
then I upgrade my cacti plugin architecture "cacti-plugin-0.8.7g-PA-v2.8.tar.gz", as I just upgrade my cacti from ver 0.87c -> 0.87g
Attachments
2 router aggregate traffic, after roll back to spine0.87e
2 router aggregate traffic, after roll back to spine0.87e
4_aggregate_cacti_pic_rollback.png (46.69 KiB) Viewed 11005 times
2 router aggregate traffic, with issue.
2 router aggregate traffic, with issue.
4_aggregate_cacti_pic_isue.png (51.71 KiB) Viewed 11005 times
spine timeout error
spine timeout error
4-spine-timeout.png (14.72 KiB) Viewed 11005 times
secondary router
secondary router
4-sr_cacti_pic_issue2.png (25.58 KiB) Viewed 11005 times
primary router
primary router
4-pr_cacti_pic_issue1.png (22.26 KiB) Viewed 11005 times
blueman176
Posts: 19
Joined: Fri Mar 05, 2010 8:18 am

Re: Spine SNMP timeout detected [500 ms], ignoring host

Post by blueman176 »

Same problem too!
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: Spine SNMP timeout detected [500 ms], ignoring host

Post by gandalf »

Please try latest spine from SVN. We had some fixes in that direction lately.
R.
LC's Disciple
Posts: 2
Joined: Wed Mar 09, 2011 6:32 am

Re: Spine SNMP timeout detected [500 ms], ignoring host

Post by LC's Disciple »

Hello,

I'm experiencing same problems but it mainly only effects mysql database hosts.
I have a long response time when looking in "Devices" and the graphs are cluttered.

I run three Cacti sites and on all I use Spine 0.8.7e and same version of PHP.
Two of the sites are having problem with database hosts (and others sometime)
but one site works fine with very good response time. The only different, besides
the geographical bit and from what I've found relevant, is the SNMP version.

The two bad sites have: NET-SNMP 5.3.1
The good site have: NET-SNMP 5.3.2.2

I don't know if this is the actual problem and I can't easily update en test due to the processes
regarding such changes. But it will be updated sooner or later.

Question reamins if this makes sense and if anyone else have had a similar problem and solved it
by updating the SNMP version? Or any other ideas on this problem?
Did anyone update the Spine version as gandalf proclaimed? How did it go?

Cheers!
skol
Posts: 41
Joined: Mon Nov 10, 2003 3:06 pm

Re: Spine SNMP timeout detected [500 ms], ignoring host

Post by skol »

LC's Disciple wrote:Hello,

I'm experiencing same problems but it mainly only effects mysql database hosts.
I have a long response time when looking in "Devices" and the graphs are cluttered.
Not sure what your problem is, from that description. What does a graph being cluttered have to do with SNMP versions?

If it is hard for you to upgrade net-snmp on a system then you've got bigger issues. Remember you'll need to rebuild spine after upgrading.

We're using 5.3.2.2 as described above.
LC's Disciple
Posts: 2
Joined: Wed Mar 09, 2011 6:32 am

Re: Spine SNMP timeout detected [500 ms], ignoring host

Post by LC's Disciple »

Not sure what your problem is, from that description. What does a graph being cluttered have to do with SNMP versions?
That's what I wanted to know. Dose it have an connection. But you run a later version NET-SNMP so maybe it's not that. It's being
cluttered because of the high response time to the host. Many graphs uses SNMP to retrieve the data but not all and all graphs are crap
so I was simply looking for any feedback on the SNMP thing. Are you having high response time to using SNMP as "Downed Device Detection"
method?
If it is hard for you to upgrade net-snmp on a system then you've got bigger issues. Remember you'll need to rebuild spine after upgrading.
It is not hard it is inconvenient :P

Thanks for reply and good luck with solving your problem :)
skol
Posts: 41
Joined: Mon Nov 10, 2003 3:06 pm

Re: Spine SNMP timeout detected [500 ms], ignoring host

Post by skol »

LC's Disciple wrote: That's what I wanted to know. Dose it have an connection. But you run a later version NET-SNMP so maybe it's not that. It's being
cluttered because of the high response time to the host. Many graphs uses SNMP to retrieve the data but not all and all graphs are crap
so I was simply looking for any feedback on the SNMP thing. Are you having high response time to using SNMP as "Downed Device Detection"
method?
No, we aren't having that issue.
skol
Posts: 41
Joined: Mon Nov 10, 2003 3:06 pm

Re: Spine SNMP timeout detected [500 ms], ignoring host

Post by skol »

Using spine from SVN did not solve the issue.
kyosanim
Posts: 37
Joined: Thu Apr 05, 2007 5:33 am

Re: Spine SNMP timeout detected [500 ms], ignoring host

Post by kyosanim »

hey guys,
i have the same issue with my new 0.8.7g installation.

Cacti Version 0.8.7g
Cacti OS unix
SNMP Version NET-SNMP version: 5.4.2.1
RRDTool Version RRDTool 1.3.x
SPINE 0.8.7g

cmd.php works fine. With spine i get :
04/21/2011 01:59:10 PM - SPINE: Poller[0] Host[3] TH[1] DS[65] WARNING: SNMP timeout detected [1000 ms], ignoring host 'x.x.x.x'

After some trial and error i think the issue is related to snmpbulkwalk in combination with spine and some hosts (not all hosts). After reducing the number of "Maximum OID's Per Get Request" in the device settings the error has gone.
but this massiv slows down the pollingtime.

Can anyone confim this behavior?

Regards,
Markus
skol
Posts: 41
Joined: Mon Nov 10, 2003 3:06 pm

Re: Spine SNMP timeout detected [500 ms], ignoring host

Post by skol »

Turns out it was a script which was polling items that was taking SECONDS to return results. I fixed the script to return results in milliseconds which solved the issue.
jehan.procaccia
Posts: 22
Joined: Tue Nov 13, 2007 10:19 am

Re: Spine SNMP timeout detected [500 ms], ignoring host

Post by jehan.procaccia »

I got the same problem

I did downgrade from spine-0.8.7g to spine-0.8.7e without success.
I adjusted "The Maximum SNMP OID's Per SNMP Get Request" to many values (from 20 to 0/1) without success .
unlike skol I don't have scripts (I guess...i't for a cisco 6500 with many interfaces counters)
What else can I do ?

Here's a run on that host (ID = 4) , indeed that are many DS
There are '643' Polling Items for this Host
but it worked fine before ....

$ /usr/bin/spine --verbosity=5 4 4
SPINE: Using spine config file [/etc/spine.conf]
WARNING: Unrecongized directive: DB_PreG=0 in /etc/spine.conf
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The path_php_server variable is /usr/share/cacti/script_server.php
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The path_cactilog variable is /var/log/cacti/cacti.log
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The log_destination variable is 1 (FILE)
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The path_php variable is /usr/bin/php
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The availability_method variable is 2
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The ping_recovery_count variable is 1
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The ping_failure_count variable is 2
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The ping_method variable is 2
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The ping_retries variable is 1
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The ping_timeout variable is 400
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The snmp_retries variable is 3
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The log_perror variable is 0
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The log_pwarn variable is 0
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The boost_redirect variable is 0
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The log_pstats variable is 0
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The threads variable is 50
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The polling interval is 300 seconds
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The number of concurrent processes is 1
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The script timeout is 25
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The number of php script servers to run is 10
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: StartHost='4', EndHost='4', TotalPHPScripts='0'
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The PHP Script Server is Not Required
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The Maximum SNMP OID Get Size is 8
07/11/2011 03:58:26 PM - SPINE: Poller[0] Version 0.8.7e starting
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: MySQL is Thread Safe!
07/11/2011 03:58:26 PM - SPINE: Poller[0] SPINE: Initializing Net-SNMP API
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: SNMP Header Version is 5.3.2.2
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: SNMP Library Version is 5.3.2.2
07/11/2011 03:58:26 PM - SPINE: Poller[0] SPINE: Initializing PHP Script Server(s)
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: Initial Value of Active Threads is 0
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: Valid Thread to be Created
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 1
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: In Poller, About to Start Polling of Host
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: Valid Thread to be Created
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 2
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: In Poller, About to Start Polling of Host
07/11/2011 03:58:26 PM - SPINE: Poller[0] Host[0] DEBUG: HOST COMPLETE: About to Exit Host Polling Thread Function
07/11/2011 03:58:26 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 1
07/11/2011 03:58:26 PM - SPINE: Poller[0] Host[4] DEBUG: ICMP Host Alive, Try Count:1, Time:136.9909 ms
07/11/2011 03:58:26 PM - SPINE: Poller[0] Host[4] PING Result: ICMP: Host is Alive
07/11/2011 03:58:26 PM - SPINE: Poller[0] Host[4] SNMP Result: SNMP not performed due to setting or ping result
07/11/2011 03:58:26 PM - SPINE: Poller[0] Host[4] RECACHE: Processing 1 items in the auto reindex cache for '157.159.8.2'
07/11/2011 03:58:27 PM - SPINE: Poller[0] Host[4] NOTE: There are '643' Polling Items for this Host
07/11/2011 03:58:28 PM - SPINE: Poller[0] Host[4] DS[23] SNMP: v2: 157.159.8.2, dsname: traffic_in, oid: .1.3.6.1.2.1.2.2.1.10.1, value: 3462470578
...
07/11/2011 03:58:39 PM - SPINE: Poller[0] Host[4] DS[147] SNMP: v2: 157.159.8.2, dsname: traffic_out, oid: .1.3.6.1.2.1.2.2.1.16.127, value: 3852501608
07/11/2011 03:58:45 PM - SPINE: Poller[0] Host[4] DS[148] WARNING: SNMP timeout detected [1500 ms], ignoring host '157.159.8.2'
07/11/2011 03:58:45 PM - SPINE: Poller[0] Host[4] DS[148] SNMP: v2: 157.159.8.2, dsname: traffic_out, oid: .1.3.6.1.2.1.2.2.1.16.128, value: U
...
7/11/2011 03:58:51 PM - SPINE: Poller[0] Host[4] DS[154] SNMP: v2: 157.159.8.2, dsname: traffic_out, oid: .1.3.6.1.2.1.2.2.1.16.134, value: U
07/11/2011 03:58:51 PM - SPINE: Poller[0] Host[4] DEBUG: HOST COMPLETE: About to Exit Host Polling Thread Function
07/11/2011 03:58:51 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 0
07/11/2011 03:58:51 PM - SPINE: Poller[0] DEBUG: Thread Cleanup Complete
07/11/2011 03:58:51 PM - SPINE: Poller[0] DEBUG: PHP Script Server Pipes Closed
07/11/2011 03:58:51 PM - SPINE: Poller[0] DEBUG: Allocated Variable Memory Freed
07/11/2011 03:58:51 PM - SPINE: Poller[0] DEBUG: MYSQL Free & Close Completed
07/11/2011 03:58:51 PM - SPINE: Poller[0] Time: 24.8853 s, Threads: 50, Hosts: 2

Thanks for any help .
jehan.procaccia
Posts: 22
Joined: Tue Nov 13, 2007 10:19 am

Re: Spine SNMP timeout detected [500 ms], ignoring host

Post by jehan.procaccia »

Forget it ...
it turns out that my switch (c6500) was really out performing !
CPU was at 98% due to a Xcast flood , so snmp was really timing out ...
this time cacti indirectly helped me to solve a switch overload :-)
now that the flood is over , cacti + spine works fine :-)
DDJ
Posts: 1
Joined: Mon May 17, 2010 3:02 pm

Re: Spine SNMP timeout detected [500 ms], ignoring host

Post by DDJ »

Ok guys, had the same problem here, but fixed with the right config at data template for my graph, in the "data source item" section, I put "minimum value = 0" and "maximum value = U", based on this guide http://www.cacti.net/downloads/docs/htm ... mbers.html

So now "WARNING: SNMP timeout detected [500 ms], ignoring host 'x.x.x.x'" is gone from cacti.log, I hope this can help some of you.

cacti 0.8.8a here with Wheezy 7.1
User avatar
NitrousOxyde
Posts: 20
Joined: Mon Sep 21, 2015 9:17 am
Location: Baku, Azerbaijan
Contact:

Re: Spine SNMP timeout detected [500 ms], ignoring host

Post by NitrousOxyde »

Hi comrades,

I've got the same problem - some of RRDs just don't get updated each time a poller runs(spine in my case)

Tried to delete and re-add problematic hosts, but then another ones stop being updated. File ownership is the same on all files - both updating and non-updating RRDs (root:root), because poller.php runs from root. File mode is also 644 for all files - both updating and non-updating

I've tried all possible methods, but it doesnt help. One property, which is the same for all hosts is that all of them are DMVPN spokes and connect to cacti server "behind" hub router via IPSec tunnel. Interestingly, routers "behind" hub router(i.e, working via direct/indirect ethernet) do not fail in such a manner, as spoke routers do
CCIE R&S
JNCIS SP
JNCIS ENT
Post Reply

Who is online

Users browsing this forum: No registered users and 0 guests