SNMP/Ping working but device status "down"

Post general support questions here that do not specifically fall into the Linux or Windows categories.

Moderators: Developers, Moderators

Post Reply
r1ch
Posts: 5
Joined: Mon Oct 24, 2011 9:07 am

SNMP/Ping working but device status "down"

Post by r1ch »

I've tried all fixes in previous threads but haven't cracked this problem yet...

Cacti and Spine 0.8.7e, Linux x64 with NET-SNMP.

I've added 3 Cisco 4510R+E swtiches without a problem, but the 4th I've added (with identical config and snmp v3 setup on the switch) is showing as "down" in the device list.

Cacti to 4510R+E:
SNMP Info - works.
SNMP - Interface Statistics "verbose query" - works.
ICMP Ping - works.
TCP/UDP ping - not allowed.

Manual from cacti box to 4510R+E:
Snmpwalk - works.
Snmpgetnext for .1.3 - works (saw this was a fix for previous bug).
ICMP Ping - works.

Code: Select all

SNMP Information
System:0e-UNIVERSALK9-M), Version 15.0(1)XO1, RELEASE SOFTWARE (fc1)
Technical Support: http://www.cisco.com/techsupport Copyright (c)
1986-2010 by Cisco Systems, Inc. Compiled Tue 14-Dec-10 22:12 by
Uptime: 24785654 (2 days, 20 hours, 50 minutes)
Hostname: xxxx
Location: xxxx
Contact: xxxx

Ping Results
ICMP Ping Success (0.961 ms) 
I can also see the access-list permit line on the switch being hit from the cacti IP.

Code: Select all

xxxx #sh access-list 1
Standard IP access list 1
    10 permit x.x.x.x (6454 matches)
I've tried deleting and adding the device again, increasing the timeouts (ping is <1ms) and I've tried all combinations of SNMP/Ping 'downed device detection' and they all result in the device being marked "down". If I turn 'downed device detection' off then it goes 'up' so there's something that's making it think it's 'down'.

It looks like a bug, or a weird quirk. Thanks in advance for any help you can give. :)
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: SNMP/Ping working but device status "down"

Post by gandalf »

Please see instructions at 2nd link of my sig for more debugging. I need the data of cacti.log when the device is polled and checked for "downed host"
R.
r1ch
Posts: 5
Joined: Mon Oct 24, 2011 9:07 am

Re: SNMP/Ping working but device status "down"

Post by r1ch »

Thanks Gandalf, I read the link and followed the instructions.

This is the only entry in the log for the device.

Code: Select all

10/24/2011 12:05:22 PM - SPINE: Poller[0] Host[731] Hostname[xxxx] ERROR: HOST EVENT: Host is DOWN Message: Host did not respond to SNMP

Code: Select all

[xx@xxxx scripts]$ /usr/bin/perl ping.pl x.x.x.x
1.25

Code: Select all

[xx@xxxx scripts]$ snmpwalk -v 3 -u xxx -l authPriv -a SHA -A xxx -x aes128 -X xxx x.x.x.x .1.3.6.1.4.1.9.9.109.1.1.1.1.6
SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.6.5000 = Gauge32: 5

Code: Select all

[xx@xxxx rra]$ spine --verbosity=5 732 732
SPINE: Using spine config file [/etc/spine.conf]
10/25/2011 03:59:36 PM - SPINE: Poller[0] DEBUG: The path_php_server variable is /var/www/html/cacti/script_server.php
10/25/2011 03:59:36 PM - SPINE: Poller[0] DEBUG: The path_cactilog variable is /var/www/html/cacti/log/cacti.log
10/25/2011 03:59:36 PM - SPINE: Poller[0] DEBUG: The log_destination variable is 1 (FILE)
10/25/2011 03:59:36 PM - SPINE: Poller[0] DEBUG: The path_php variable is /usr/bin/php
10/25/2011 03:59:36 PM - SPINE: Poller[0] DEBUG: The availability_method variable is 2
10/25/2011 03:59:36 PM - SPINE: Poller[0] DEBUG: The ping_recovery_count variable is 3
10/25/2011 03:59:36 PM - SPINE: Poller[0] DEBUG: The ping_failure_count variable is 2
10/25/2011 03:59:36 PM - SPINE: Poller[0] DEBUG: The ping_method variable is 1
10/25/2011 03:59:36 PM - SPINE: Poller[0] DEBUG: The ping_retries variable is 1
10/25/2011 03:59:36 PM - SPINE: Poller[0] DEBUG: The ping_timeout variable is 400
10/25/2011 03:59:36 PM - SPINE: Poller[0] DEBUG: The snmp_retries variable is 3
10/25/2011 03:59:36 PM - SPINE: Poller[0] DEBUG: The log_perror variable is 1
10/25/2011 03:59:36 PM - SPINE: Poller[0] DEBUG: The log_pwarn variable is 1
10/25/2011 03:59:36 PM - SPINE: Poller[0] DEBUG: The boost_redirect variable is 0
10/25/2011 03:59:36 PM - SPINE: Poller[0] DEBUG: The log_pstats variable is 1
10/25/2011 03:59:36 PM - SPINE: Poller[0] DEBUG: The threads variable is 10
10/25/2011 03:59:36 PM - SPINE: Poller[0] DEBUG: The polling interval is 300 seconds
10/25/2011 03:59:36 PM - SPINE: Poller[0] DEBUG: The number of concurrent processes is 8
10/25/2011 03:59:36 PM - SPINE: Poller[0] DEBUG: The script timeout is 120
10/25/2011 03:59:36 PM - SPINE: Poller[0] DEBUG: The number of php script servers to run is 1
10/25/2011 03:59:36 PM - SPINE: Poller[0] DEBUG: StartHost='732', EndHost='732', TotalPHPScripts='0'
10/25/2011 03:59:36 PM - SPINE: Poller[0] DEBUG: The PHP Script Server is Not Required
10/25/2011 03:59:36 PM - SPINE: Poller[0] DEBUG: The Maximum SNMP OID Get Size is 30
10/25/2011 03:59:36 PM - SPINE: Poller[0] Version 0.8.7e starting
10/25/2011 03:59:36 PM - SPINE: Poller[0] DEBUG: MySQL is Thread Safe!
10/25/2011 03:59:36 PM - SPINE: Poller[0] SPINE: Initializing Net-SNMP API
10/25/2011 03:59:36 PM - SPINE: Poller[0] DEBUG: Issues with SNMP Header Version information, assuming old version of Net-SNMP.
10/25/2011 03:59:37 PM - SPINE: Poller[0] SPINE: Initializing PHP Script Server(s)
10/25/2011 03:59:37 PM - SPINE: Poller[0] DEBUG: Initial Value of Active Threads is 0
10/25/2011 03:59:37 PM - SPINE: Poller[0] DEBUG: Valid Thread to be Created
10/25/2011 03:59:37 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 1
10/25/2011 03:59:37 PM - SPINE: Poller[0] DEBUG: In Poller, About to Start Polling of Host
10/25/2011 03:59:37 PM - SPINE: Poller[0] Host[0] DEBUG: HOST COMPLETE: About to Exit Host Polling Thread Function
10/25/2011 03:59:37 PM - SPINE: Poller[0] DEBUG: Valid Thread to be Created
10/25/2011 03:59:37 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 2
10/25/2011 03:59:37 PM - SPINE: Poller[0] DEBUG: In Poller, About to Start Polling of Host
10/25/2011 03:59:37 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 1
10/25/2011 03:59:37 PM - SPINE: Poller[0] Host[732] SNMP Result: Host responded to SNMP
10/25/2011 03:59:37 PM - SPINE: Poller[0] Host[732] RECACHE: Processing 2 items in the auto reindex cache for 'xxxx'
10/25/2011 03:59:37 PM - SPINE: Poller[0] Host[732] NOTE: There are '2' Polling Items for this Host
10/25/2011 03:59:37 PM - SPINE: Poller[0] Host[732] DS[10970] SNMP: v3: xxxx, dsname: traffic_in, oid: .1.3.6.1.2.1.31.1.1.1.6.350, value: 431931370405
10/25/2011 03:59:37 PM - SPINE: Poller[0] Host[732] DS[10970] SNMP: v3: xxxx, dsname: traffic_out, oid: .1.3.6.1.2.1.31.1.1.1.10.350, value: 92116970727
10/25/2011 03:59:37 PM - SPINE: Poller[0] Host[732] DEBUG: HOST COMPLETE: About to Exit Host Polling Thread Function
10/25/2011 03:59:37 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 0
10/25/2011 03:59:37 PM - SPINE: Poller[0] DEBUG: Thread Cleanup Complete
10/25/2011 03:59:37 PM - SPINE: Poller[0] DEBUG: PHP Script Server Pipes Closed
10/25/2011 03:59:37 PM - SPINE: Poller[0] DEBUG: Allocated Variable Memory Freed
10/25/2011 03:59:37 PM - SPINE: Poller[0] DEBUG: MYSQL Free & Close Completed
10/25/2011 03:59:37 PM - SPINE: Poller[0] Time: 0.4108 s, Threads: 10, Hosts: 2
And interestingly...

Graph debug:

Code: Select all

RRDTool Command:

/usr/bin/rrdtool graph - \
--imgformat=PNG \
--start=-86400 \
--end=-300 \
--title="xxxx Po1 - (xxxx)" \
--rigid \
--base=1000 \
--height=120 \
--width=600 \
--alt-autoscale-max \
--lower-limit=0 \
--vertical-label="bits per second" \
--slope-mode \
--font TITLE:10: \
--font AXIS:8: \
--font LEGEND:8: \
--font UNIT:8: \
DEF:a="/var/www/html/cacti/rra/xxxx_traffic_in_10970.rrd":traffic_in:AVERAGE \
DEF:b="/var/www/html/cacti/rra/xxxx_traffic_in_10970.rrd":traffic_out:AVERAGE \
CDEF:cdefa=a,8,* \
CDEF:cdefg=b,8,* \
AREA:cdefa#00CF0019:""  \
LINE1:cdefa#00CF00FF:"Inbound"  \
GPRINT:cdefa:LAST:" Current\:%8.2lf %s"  \
GPRINT:cdefa:AVERAGE:"Average\:%8.2lf %s"  \
GPRINT:a:MAX:"Maximum\:%8.2lf %s"  \
COMMENT:"Total In\:  0 bytes\n"  \
AREA:cdefg#002A9719:""  \
LINE1:cdefg#002A97FF:"Outbound"  \
GPRINT:b:LAST:"Current\:%8.2lf %s"  \
GPRINT:b:AVERAGE:"Average\:%8.2lf %s"  \
GPRINT:cdefg:MAX:"Maximum\:%8.2lf %s"  \
COMMENT:"Total Out\: 0 bytes" 

RRDTool Says:
ERROR: opening '/var/www/html/cacti/rra/xxxx_traffic_in_10970.rrd': No such file or directory
So I checked the folder, and the .rrd files are not being created. This is a problem for sure, but is it the reason why the host is "down"?
User avatar
TheWitness
Developer
Posts: 17059
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Re: SNMP/Ping working but device status "down"

Post by TheWitness »

Your example was host_id 731 and your spine run was against host_id 732. RRDfiles are updated by poller.php and not spine.
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
r1ch
Posts: 5
Joined: Mon Oct 24, 2011 9:07 am

Re: SNMP/Ping working but device status "down"

Post by r1ch »

Good spot. I deleted and re-added the host in the hope that it would spark it into life, hence the 731 and 732. I've also added a host 733 which is showing the same problem - 'down' in the device list but SNMP working etc.

So, I looked through the log again and searched for "Host[73"

Code: Select all

10/25/2011 04:24:22 PM - SPINE: Poller[0] Host[733] DEBUG: HOST COMPLETE: About to Exit Host Polling Thread Function
10/25/2011 04:24:22 PM - SPINE: Poller[0] Host[733] DS[10971] SNMP: v3: xxxx-2, dsname: traffic_in, oid: .1.3.6.1.2.1.31.1.1.1.6.263, value: 3856234179869
10/25/2011 04:24:22 PM - SPINE: Poller[0] Host[733] DS[10971] SNMP: v3: xxxx-2, dsname: traffic_out, oid: .1.3.6.1.2.1.31.1.1.1.10.263, value: 677856069108
10/25/2011 04:24:21 PM - SPINE: Poller[0] Host[733] NOTE: There are '2' Polling Items for this Host
10/25/2011 04:24:21 PM - SPINE: Poller[0] Host[733] RECACHE: Processing 2 items in the auto reindex cache for 'xxxx-2'
10/25/2011 04:24:21 PM - SPINE: Poller[0] Host[733] SNMP Result: Host responded to SNMP
10/25/2011 03:59:37 PM - SPINE: Poller[0] Host[732] DEBUG: HOST COMPLETE: About to Exit Host Polling Thread Function
10/25/2011 03:59:37 PM - SPINE: Poller[0] Host[732] DS[10970] SNMP: v3: xxxx, dsname: traffic_out, oid: .1.3.6.1.2.1.31.1.1.1.10.350, value: 92116970727
10/25/2011 03:59:37 PM - SPINE: Poller[0] Host[732] DS[10970] SNMP: v3: xxxx, dsname: traffic_in, oid: .1.3.6.1.2.1.31.1.1.1.6.350, value: 431931370405
10/25/2011 03:59:37 PM - SPINE: Poller[0] Host[732] NOTE: There are '2' Polling Items for this Host
10/25/2011 03:59:37 PM - SPINE: Poller[0] Host[732] RECACHE: Processing 2 items in the auto reindex cache for 'xxxx'
10/25/2011 03:59:37 PM - SPINE: Poller[0] Host[732] SNMP Result: Host responded to SNMP
10/24/2011 12:05:22 PM - SPINE: Poller[0] Host[731] Hostname[xxxx] ERROR: HOST EVENT: Host is DOWN Message: Host did not respond to SNMP
10/22/2011 02:45:18 PM - SPINE: Poller[0] Host[731] Hostname[xxxx] NOTICE: HOST EVENT: Host Returned from DOWN State 
The last line is the host showing as "Up" back on the 22nd when I disabled the 'downed device detection', and then the line above was when I turned it back on. I think it was about 4PM yesterday (24th) I deleted and added the host (which moved it to 732) and added the second device that's also not working (733).

What do I need to check with poller.php?

Thank you very much for your help, this is really bugging me!
User avatar
TheWitness
Developer
Posts: 17059
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Re: SNMP/Ping working but device status "down"

Post by TheWitness »

You are going to have to use Wireshark or the like to see what is going on.
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
r1ch
Posts: 5
Joined: Mon Oct 24, 2011 9:07 am

Re: SNMP/Ping working but device status "down"

Post by r1ch »

I'm happy to go ahead and start looking at tcpdump/wireshark level stuff, but with the manual snmp polls and pings working, haven't we already worked out that the cacti box can talk, and has the right snmp details to poll the device? If I'm wrong or not understanding something and there's something specific you're after with the packet capture, I'll go ahead and do it.

If is pulling back the right info (which it seems to be) isn't it something I've got wrong in cacti settings/permissions, or a bug in how cacti detects "downed" devices - possibly only when conditions x and y are present or something like that?

Really appreciate your, or anyone else's time, in looking into this. Cacti is a great tool that is getting better and better all the time and I'm really thankful for it. :)

EDIT: I've found the following at the bottom of the /var/log/poller.log (after all the OK's)?

Code: Select all

10/26/2011 02:15:24 PM - SYSTEM STATS: Time:22.7911 Method:spine Processes:8 Threads:10 Hosts:422 HostsPerProcess:53 DataSources:14171 RRDsProcessed:5108
PHP Warning:  session_start(): open(/var/lib/php/session/sess_9d2psitrt9uie4uoeb5tmu8g16, O_RDWR) failed: Permission denied (13) in /var/www/html/cacti/include/global.php on line 149
PHP Warning:  session_start(): open(/var/lib/php/session/sess_beoegjrqfu3ier4o3c6rcjmvh2, O_RDWR) failed: Permission denied (13) in /var/www/html/cacti/include/global.php on line 149
PHP Warning:  Unknown: open(/var/lib/php/session/sess_chhg9m4lpq2odnpr5l69s1lgr3, O_RDWR) failed: Permission denied (13) in Unknown on line 0
PHP Warning:  Unknown: Failed to write session data (files). Please verify that the current setting of session.save_path is correct (/var/lib/php/session) in Unknown on line 0
PHP Warning:  Unknown: open(/var/lib/php/session/sess_beoegjrqfu3ier4o3c6rcjmvh2, O_RDWR) failed: Permission denied (13) in Unknown on line 0
PHP Warning:  Unknown: Failed to write session data (files). Please verify that the current setting of session.save_path is correct (/var/lib/php/session) in Unknown on line 0
PHP Warning:  Unknown: open(/var/lib/php/session/sess_9d2psitrt9uie4uoeb5tmu8g16, O_RDWR) failed: Permission denied (13) in Unknown on line 0
PHP Warning:  Unknown: Failed to write session data (files). Please verify that the current setting of session.save_path is correct (/var/lib/php/session) in Unknown on line 0
And when I try and run poller.php on it's own I only get the following:

Code: Select all

[xx@xxxx cacti]$ pwd
/var/www/html/cacti
[xx@xxxx cacti]$ php poller.php
Manage : initializing...
[xx@xxxx cacti]$
User avatar
TheWitness
Developer
Posts: 17059
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Re: SNMP/Ping working but device status "down"

Post by TheWitness »

That's a permissions problem in the session directory. The poller is attempting to create a session file and can not. There was a spine issue, corrected in 0.8.7h, that impacted snmpv3.

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
r1ch
Posts: 5
Joined: Mon Oct 24, 2011 9:07 am

Re: SNMP/Ping working but device status "down"

Post by r1ch »

I see, thanks. What should the permissions be for that folder?

Is there a reason why it would only be happening with new switches I add? Cacti's been working for a year or so and still monitors and polls other switches fine, including other newly added 4510R+E's fine. It seems to be a problem that has just started happening and I can't think of anything specific that's changed.
User avatar
TheWitness
Developer
Posts: 17059
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Re: SNMP/Ping working but device status "down"

Post by TheWitness »

It depends on the SNMPv3 settings. First, I'm not certain that older versions of the PHP-SNMP module supported AES. This should be fixed in PHP5.3. However, that would not impact Spine generally.

From Spines perspective, if using "AuthNoPriv" there was a bug in the g release, corrected in the H release that prevented pings from working. Lastly, we only support up to 128 bit AES.
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest