Updated to 0.8.8b - suddenly get SNMP timeouts?

Post general support questions here that do not specifically fall into the Linux or Windows categories.

Moderators: Developers, Moderators

User avatar
Howie
Cacti Guru User
Posts: 5508
Joined: Thu Sep 16, 2004 5:53 am
Location: United Kingdom
Contact:

Updated to 0.8.8b - suddenly get SNMP timeouts?

Post by Howie »

I just updated my 0.8.7g installation to 0.8.8b yesterday. I also updated Spine to 0.8.8b at the same time, and thold to the latest version.

Ever since then, some devices, but not all, are giving SNMP timeout errors. I've increased the timeout to 2000,3000 and then 5000 with no change. Using Realtime gives graphs for the same DS with no problem at all.

I also seem to get stuff remaining in the poller_output table after polling, even if I truncate poller_output in between polls (?).

Any ideas on where to look? Has something changed that I missed in the last couple of versions?
Weathermap 0.98a is out! & QuickTree 1.0. Superlinks is over there now (and built-in to Cacti 1.x).
Some Other Cacti tweaks, including strip-graphs, icons and snmp/netflow stuff.
(Let me know if you have UK DevOps or Network Ops opportunities, too!)
User avatar
Howie
Cacti Guru User
Posts: 5508
Joined: Thu Sep 16, 2004 5:53 am
Location: United Kingdom
Contact:

Re: Updated to 0.8.8b - suddenly get SNMP timeouts?

Post by Howie »

Running spine manually for one of the hosts produces incorrect results, too:

(i.e. I have 2 second timeout on SNMP, all the requests "time out", but the ENTIRE SCRIPT takes less than 2 seconds to run??)

Code: Select all

./spine --verbosity=5 -R -H 183

SPINE: Using spine config file [../etc/spine.conf]
09/11/2013 04:31:40 PM - SPINE: Poller[0] DEBUG: The path_php_server variable is /var/www/html/script_server.php
09/11/2013 04:31:40 PM - SPINE: Poller[0] DEBUG: The path_cactilog variable is /var/www/html/log/cacti.log
09/11/2013 04:31:40 PM - SPINE: Poller[0] DEBUG: The log_destination variable is 1 (FILE)
09/11/2013 04:31:40 PM - SPINE: Poller[0] DEBUG: The path_php variable is /usr/bin/php
09/11/2013 04:31:40 PM - SPINE: Poller[0] DEBUG: The availability_method variable is 2
09/11/2013 04:31:40 PM - SPINE: Poller[0] DEBUG: The ping_recovery_count variable is 3
09/11/2013 04:31:40 PM - SPINE: Poller[0] DEBUG: The ping_failure_count variable is 2
09/11/2013 04:31:40 PM - SPINE: Poller[0] DEBUG: The ping_method variable is 2
09/11/2013 04:31:40 PM - SPINE: Poller[0] DEBUG: The ping_retries variable is 3
09/11/2013 04:31:40 PM - SPINE: Poller[0] DEBUG: The ping_timeout variable is 1400
09/11/2013 04:31:40 PM - SPINE: Poller[0] DEBUG: The snmp_retries variable is 3
09/11/2013 04:31:40 PM - SPINE: Poller[0] DEBUG: The log_perror variable is 1
09/11/2013 04:31:40 PM - SPINE: Poller[0] DEBUG: The log_pwarn variable is 0
09/11/2013 04:31:40 PM - SPINE: Poller[0] DEBUG: The boost_redirect variable is 1
09/11/2013 04:31:40 PM - SPINE: Poller[0] DEBUG: The log_pstats variable is 0
09/11/2013 04:31:40 PM - SPINE: Poller[0] DEBUG: The threads variable is 30
09/11/2013 04:31:40 PM - SPINE: Poller[0] DEBUG: The polling interval is 300 seconds
09/11/2013 04:31:40 PM - SPINE: Poller[0] DEBUG: The number of concurrent processes is 1
09/11/2013 04:31:40 PM - SPINE: Poller[0] DEBUG: The script timeout is 10
09/11/2013 04:31:40 PM - SPINE: Poller[0] DEBUG: The number of php script servers to run is 10
09/11/2013 04:31:40 PM - SPINE: Poller[0] DEBUG: Host List to be polled='183', TotalPHPScripts='0'
09/11/2013 04:31:40 PM - SPINE: Poller[0] DEBUG: The PHP Script Server is Not Required
09/11/2013 04:31:40 PM - SPINE: Poller[0] DEBUG: The Maximum SNMP OID Get Size is 100
09/11/2013 04:31:40 PM - SPINE: Poller[0] Version 0.8.8b starting
09/11/2013 04:31:40 PM - SPINE: Poller[0] DEBUG: MySQL is Thread Safe!
09/11/2013 04:31:40 PM - SPINE: Poller[0] DEBUG: Spine is running asroot.
09/11/2013 04:31:40 PM - SPINE: Poller[0] SPINE: Initializing Net-SNMP API
09/11/2013 04:31:40 PM - SPINE: Poller[0] DEBUG: Issues with SNMP Header Version information, assuming old version of Net-SNMP.
09/11/2013 04:31:40 PM - SPINE: Poller[0] SPINE: Initializing PHP Script Server(s)
09/11/2013 04:31:40 PM - SPINE: Poller[0] NOTE: Spine will support multithread device polling.
09/11/2013 04:31:40 PM - SPINE: Poller[0] NOTE: Spine is behaving in a 0.8.7g+ manner
09/11/2013 04:31:40 PM - SPINE: Poller[0] DEBUG: Initial Value of Active Threads is 0
09/11/2013 04:31:40 PM - SPINE: Poller[0] DEBUG: Valid Thread to be Created
09/11/2013 04:31:40 PM - SPINE: Poller[0] DEBUG: In Poller, About to Start Polling of Host
09/11/2013 04:31:40 PM - SPINE: Poller[0] Host[0] TH[1] Total Time: 0.00046 Seconds
09/11/2013 04:31:40 PM - SPINE: Poller[0] Host[0] TH[1] DEBUG: HOST COMPLETE: About to Exit Host Polling Thread Function
09/11/2013 04:31:40 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 1
09/11/2013 04:31:40 PM - SPINE: Poller[0] DEBUG: Valid Thread to be Created
09/11/2013 04:31:40 PM - SPINE: Poller[0] DEBUG: In Poller, About to Start Polling of Host
09/11/2013 04:31:40 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 2
09/11/2013 04:31:40 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 1
09/11/2013 04:31:40 PM - SPINE: Poller[0] Host[183] DEBUG: Entering SNMP Ping
09/11/2013 04:31:40 PM - SPINE: Poller[0] Host[183] SNMP Result: Host responded to SNMP
09/11/2013 04:31:40 PM - SPINE: Poller[0] Host[183] TH[1] RECACHE: Processing 2 items in the auto reindex cache for '10.0.65.41'
09/11/2013 04:31:40 PM - SPINE: Poller[0] Host[183] TH[1] Recache DataQuery[1] OID: .1.3.6.1.2.1.1.3.0, output: 1035698069
09/11/2013 04:31:40 PM - SPINE: Poller[0] Host[183] TH[1] Recache DataQuery[16] OID: .1.3.6.1.2.1.1.3.0, output: 1035698069
09/11/2013 04:31:40 PM - SPINE: Poller[0] Host[183] TH[1] NOTE: There are '240' Polling Items for this Host
09/11/2013 04:31:40 PM - SPINE: Poller[0] Host[183] TH[1] DS[6799] WARNING: SNMP timeout detected [2000 ms], ignoring host '10.0.65.41'
09/11/2013 04:31:40 PM - SPINE: Poller[0] Host[183] TH[1] DS[6799] SNMP: v3: 10.0.65.41, dsname: swFCPortRxBadEofs, oid: .1.3.6.1.4.1.1588.2.1.1.1.6.2.1.25.18, value: U
09/11/2013 04:31:40 PM - SPINE: Poller[0] Host[183] TH[1] DS[6799] WARNING: SNMP timeout detected [2000 ms], ignoring host '10.0.65.41'
09/11/2013 04:31:40 PM - SPINE: Poller[0] Host[183] TH[1] DS[6799] SNMP: v3: 10.0.65.41, dsname: swFCPortRxTooLongs, oid: .1.3.6.1.4.1.1588.2.1.1.1.6.2.1.24.18, value: U
09/11/2013 04:31:40 PM - SPINE: Poller[0] Host[183] TH[1] DS[6799] WARNING: SNMP timeout detected [2000 ms], ignoring host '10.0.65.41'

[snipped many more repeated "timeouts"]

09/11/2013 04:31:40 PM - SPINE: Poller[0] Host[183] TH[1] Total Time:  0.14 Seconds
09/11/2013 04:31:40 PM - SPINE: Poller[0] Host[183] TH[1] DEBUG: HOST COMPLETE: About to Exit Host Polling Thread Function
09/11/2013 04:31:40 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 0
09/11/2013 04:31:40 PM - SPINE: Poller[0] DEBUG: Thread Cleanup Complete
09/11/2013 04:31:40 PM - SPINE: Poller[0] DEBUG: PHP Script Server Pipes Closed
09/11/2013 04:31:40 PM - SPINE: Poller[0] DEBUG: Allocated Variable Memory Freed
09/11/2013 04:31:40 PM - SPINE: Poller[0] DEBUG: MYSQL Free & Close Completed
09/11/2013 04:31:40 PM - SPINE: Poller[0] DEBUG: Net-SNMP Close Completed
09/11/2013 04:31:40 PM - SPINE: Poller[0] Time: 0.1952 s, Threads: 30, Hosts: 2

Weathermap 0.98a is out! & QuickTree 1.0. Superlinks is over there now (and built-in to Cacti 1.x).
Some Other Cacti tweaks, including strip-graphs, icons and snmp/netflow stuff.
(Let me know if you have UK DevOps or Network Ops opportunities, too!)
User avatar
Howie
Cacti Guru User
Posts: 5508
Joined: Thu Sep 16, 2004 5:53 am
Location: United Kingdom
Contact:

Re: Updated to 0.8.8b - suddenly get SNMP timeouts?

Post by Howie »

Switching back to Spine 0.8.7g resolves this for me - I get a clean collection in about 10 seconds.
Weathermap 0.98a is out! & QuickTree 1.0. Superlinks is over there now (and built-in to Cacti 1.x).
Some Other Cacti tweaks, including strip-graphs, icons and snmp/netflow stuff.
(Let me know if you have UK DevOps or Network Ops opportunities, too!)
paulgevers
Cacti Pro User
Posts: 613
Joined: Tue Aug 29, 2006 4:09 pm
Location: NL

Re: Updated to 0.8.8b - suddenly get SNMP timeouts?

Post by paulgevers »

Does the discussion in http://bugs.cacti.net/view.php?id=2390 help in any way?
Maintainer of cacti in Debian (and Ubuntu).
Cacti 1.* is now officially supported on Debian Stretch via Debian backports
FAQ Ubuntu and Debian differences
Generic cacti debugging
User avatar
Howie
Cacti Guru User
Posts: 5508
Joined: Thu Sep 16, 2004 5:53 am
Location: United Kingdom
Contact:

Re: Updated to 0.8.8b - suddenly get SNMP timeouts?

Post by Howie »

No, this is with SNMP rather than scripts. It's as though Spine is using non-blocking SNMP calls suddenly.
Weathermap 0.98a is out! & QuickTree 1.0. Superlinks is over there now (and built-in to Cacti 1.x).
Some Other Cacti tweaks, including strip-graphs, icons and snmp/netflow stuff.
(Let me know if you have UK DevOps or Network Ops opportunities, too!)
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: Updated to 0.8.8b - suddenly get SNMP timeouts?

Post by gandalf »

Howie wrote:It's as though Spine is using non-blocking SNMP calls suddenly.
As Larry is the spine guy, I can't answer this by heart. You discovered this change as a difference between 087g and 088b, right? I suppose it's in 088a as well, then.
R.
User avatar
Howie
Cacti Guru User
Posts: 5508
Joined: Thu Sep 16, 2004 5:53 am
Location: United Kingdom
Contact:

Re: Updated to 0.8.8b - suddenly get SNMP timeouts?

Post by Howie »

gandalf wrote:
Howie wrote:It's as though Spine is using non-blocking SNMP calls suddenly.
As Larry is the spine guy, I can't answer this by heart. You discovered this change as a difference between 087g and 088b, right? I suppose it's in 088a as well, then.
R.
That I didn't test - I just re-installed the previously-working version. I'll try it on Monday (in read-only mode).
Weathermap 0.98a is out! & QuickTree 1.0. Superlinks is over there now (and built-in to Cacti 1.x).
Some Other Cacti tweaks, including strip-graphs, icons and snmp/netflow stuff.
(Let me know if you have UK DevOps or Network Ops opportunities, too!)
mozolins
Posts: 20
Joined: Thu Feb 21, 2013 3:14 am

Re: Updated to 0.8.8b - suddenly get SNMP timeouts?

Post by mozolins »

Having a similar issue on Centos 6 and spine 0.8.8b.

Log
09/22/2013 09:40:14 AM - SPINE: Poller[0] Host[24] TH[3] DS[801] SNMP: v2: llpasagtw01.linearlinc.net, dsname: errors_out, oid: .1.3.6.1.2.1.2.2.1.20.14, value: U
09/22/2013 09:40:14 AM - SPINE: Poller[0] Host[24] TH[3] DS[802] WARNING: SNMP timeout detected [2000 ms], ignoring host 'llpasagtw01.linearlinc.net'

[root@llvc6mgr01 cacti]# php -ver
PHP 5.3.3 (cli) (built: Jul 12 2013 20:35:47)
Copyright (c) 1997-2010 The PHP Group
Zend Engine v2.3.0, Copyright (c) 1998-2010 Zend Technologies

Spine was built on this system. binary is SUID root.
the bug referenced on 2390 not even close.
User avatar
Howie
Cacti Guru User
Posts: 5508
Joined: Thu Sep 16, 2004 5:53 am
Location: United Kingdom
Contact:

Re: Updated to 0.8.8b - suddenly get SNMP timeouts?

Post by Howie »

And just to confirm - Spine 0.8.8a does the same for me too.

[initial output chopped - but there are 240 poller items according to spine]
09/23/2013 02:34:04 PM - SPINE: Poller[0] Host[183] TH[1] DS[6699] WARNING: SNMP timeout detected [2000 ms], ignoring host '10.0.65.41'
09/23/2013 02:34:04 PM - SPINE: Poller[0] Host[183] TH[1] DS[6699] SNMP: v3: 10.0.65.41, dsname: swFCPortRxFrames, oid: .1.3.6.1.4.1.1588.2.1.1.1.6.2.1.14.15, value: U
09/23/2013 02:34:04 PM - SPINE: Poller[0] Host[183] TH[1] Total Time: 0.6 Seconds
09/23/2013 02:34:04 PM - SPINE: Poller[0] Host[183] TH[1] DEBUG: HOST COMPLETE: About to Exit Host Polling Thread Function
09/23/2013 02:34:04 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 0
09/23/2013 02:34:04 PM - SPINE: Poller[0] DEBUG: Thread Cleanup Complete
09/23/2013 02:34:04 PM - SPINE: Poller[0] DEBUG: PHP Script Server Pipes Closed
09/23/2013 02:34:04 PM - SPINE: Poller[0] DEBUG: Allocated Variable Memory Freed
09/23/2013 02:34:04 PM - SPINE: Poller[0] DEBUG: MYSQL Free & Close Completed
09/23/2013 02:34:04 PM - SPINE: Poller[0] DEBUG: Net-SNMP Close Completed
09/23/2013 02:34:04 PM - SPINE: Poller[0] Time: 0.6605 s, Threads: 30, Hosts: 2
Weathermap 0.98a is out! & QuickTree 1.0. Superlinks is over there now (and built-in to Cacti 1.x).
Some Other Cacti tweaks, including strip-graphs, icons and snmp/netflow stuff.
(Let me know if you have UK DevOps or Network Ops opportunities, too!)
mozolins
Posts: 20
Joined: Thu Feb 21, 2013 3:14 am

Re: Updated to 0.8.8b - suddenly get SNMP timeouts?

Post by mozolins »

Had even worse issues with 8.8.a.

Could it's snmp be missing the MIB's? I don't recall it asking for them and netsnmp is a bit cranky about them. If so how do we test it.
User avatar
Howie
Cacti Guru User
Posts: 5508
Joined: Thu Sep 16, 2004 5:53 am
Location: United Kingdom
Contact:

Re: Updated to 0.8.8b - suddenly get SNMP timeouts?

Post by Howie »

I don't use named OIDs, just numbers. It seems to get the right numbers, and it doesn't fail for *every* device, although I haven't spotted the pattern. We use a lot of SNMPv3, but I think some v2c devices were failing too.
Weathermap 0.98a is out! & QuickTree 1.0. Superlinks is over there now (and built-in to Cacti 1.x).
Some Other Cacti tweaks, including strip-graphs, icons and snmp/netflow stuff.
(Let me know if you have UK DevOps or Network Ops opportunities, too!)
mozolins
Posts: 20
Joined: Thu Feb 21, 2013 3:14 am

Re: Updated to 0.8.8b - suddenly get SNMP timeouts?

Post by mozolins »

Was hoping we'd made some progress here. Spine is ignoring SNMP v2 devices. Where does spine get OID's for SNMP checks. Also Pings are failing.

Let me know if you need further data.

Martin :-?
mozolins
Posts: 20
Joined: Thu Feb 21, 2013 3:14 am

Re: Updated to 0.8.8b - suddenly get SNMP timeouts?

Post by mozolins »

These are devices that I've added the Manufacturer's MIB's for: Cisco ASA, and F5 BigIP.
User avatar
Howie
Cacti Guru User
Posts: 5508
Joined: Thu Sep 16, 2004 5:53 am
Location: United Kingdom
Contact:

Re: Updated to 0.8.8b - suddenly get SNMP timeouts?

Post by Howie »

The standard SNMP interface traffic template uses numeric OIDs, so do all of the ones I've added myself. There should be no need for a MIB file.

Also, the example I pasted up above appears to be an SNMPv3 device that's failing, not v2.
Weathermap 0.98a is out! & QuickTree 1.0. Superlinks is over there now (and built-in to Cacti 1.x).
Some Other Cacti tweaks, including strip-graphs, icons and snmp/netflow stuff.
(Let me know if you have UK DevOps or Network Ops opportunities, too!)
mozolins
Posts: 20
Joined: Thu Feb 21, 2013 3:14 am

Re: Updated to 0.8.8b - suddenly get SNMP timeouts?

Post by mozolins »

The log entries I posted above were from an snmp ver 2c device.
Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest