[SOLVED] New device can't get SNMP... or can it?

Post support questions that directly relate to Linux/Unix operating systems.

Moderators: Developers, Moderators

Post Reply
dampersand
Posts: 10
Joined: Wed Feb 24, 2016 4:28 pm

[SOLVED] New device can't get SNMP... or can it?

Post by dampersand »

Hi!

I inherited a very messy cactiEZ system a few weeks back, and I've been up and down the documentation (and this forum) for eons getting everything we need sussed out. Thanks to the very useful information on this forum (and the multitudes of people having similar problems to mine), I have managed to set our system up ALMOST 100% the way it needs to be.

I am stuck on ONE final problem. This is a problem that I've seen quite a few times on the forums, but somehow nothing seems to work for me:

I have a new device that needs graphed, but the darn thing won't play ball - it shows as 'down' in the device list (unless I use ICMP ping, more on that in a moment), and will not populate any of the graphs assigned to it. The worst part is, it is set up in cacti EXACTLY the same as another device of the same type, name, and location... and the OTHER device works flawlessly. The only difference that I know of is that the working device was created before I got here, and the nonworking device is created new.

Here's a list of information:

1. Device shows as 'down' in device list while using "SNMP Uptime" as its Downed Device Detection. When I switch to Ping Or SNMP, the device comes 'up' in the device list (but graphs still won't populate).
2. SNMP information on the 'device' page properly populates.
3. Running the 'verbose query' on SNMP - Interface Statistics returns a full list of information.
4. I can manually SNMPwalk from the server, run snmpgetnext on .1 and .1.3 as well.
5. I have double- and triple-checked cacti's SNMP authorization settings. They match. They also match on the other device, which is working.
6. The switch configs for both devices are identical, save for obvious things (their IP addresses, etc)
7. poller.php is being run as root. I can run the darn thing manually, too, no problem... but it doesn't help.
8. The rra directory is set to allow apache:apache (our cacti user)

As I mention in point 1, when I use ICMP as downed detection, the device shows as 'up' and moves forward. The debug loglevel of cacti gives the following information pertaining to host 97 (the culprit)

Code: Select all

03/11/2016 01:50:22 PM - SPINE: Poller[0] Host[97] DEBUG: Entering ICMP Ping
03/11/2016 01:50:22 PM - SPINE: Poller[0] Host[97] DEBUG: ICMP Host Alive, Try Count:1, Time:0.1950 ms
03/11/2016 01:50:22 PM - SPINE: Poller[0] Host[97] PING Result: ICMP: Host is Alive
03/11/2016 01:50:22 PM - SPINE: Poller[0] Host[97] SNMP Result: SNMP not performed due to setting or ping result
03/11/2016 01:50:22 PM - SPINE: Poller[0] Host[97] TH[1] RECACHE: Processing 1 items in the auto reindex cache for '10.129.0.154'

...Some artifact data from weathermap, the noisiest plugin on the planet...

03/11/2016 01:50:30 PM - SPINE: Poller[0] Host[97] TH[1] Recache DataQuery[3] OID: .1.3.6.1.2.1.1.3.0, output: U
03/11/2016 01:50:30 PM - SPINE: Poller[0] Host[97] TH[1] NOTE: There are '39' Polling Items for this Host
03/11/2016 01:50:30 PM - SPINE: Poller[0] Host[97] TH[1] Total Time:     8 Seconds
03/11/2016 01:50:30 PM - SPINE: Poller[0] Host[97] TH[1] DEBUG: HOST COMPLETE: About to Exit Host Polling Thread Function
03/11/2016 01:50:30 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 0
03/11/2016 01:50:30 PM - SPINE: Poller[0] DEBUG: Thread Cleanup Complete
03/11/2016 01:50:30 PM - SPINE: Poller[0] DEBUG: SS[0] Script Server Shutdown Started
03/11/2016 04:50:30 PM - PHPSVR: Poller[0] DEBUG: PHP Script Server Shutdown request received, exiting
Now, as I understand it, the 'output: U' in this case suggests there are no updates, which is blatantly incorrect. You may also notice the goofed up timestamp at the very end - I think that's due to weathermap, which is configured to use a timezone 3 hours ahead.

Switching Downed Device Detection to "SNMP Only" returns the following in logs:

Code: Select all

03/11/2016 09:32:20 PM - SPINE: Poller[0] Host[97] DEBUG: Entering SNMP Ping
03/11/2016 01:57:26 PM - SPINE: Poller[0] Host[97] SNMP Ping Error: Unknown error: 2
03/11/2016 01:57:26 PM - SPINE: Poller[0] Host[97] SNMP Result: Host did not respond to SNMP
03/11/2016 01:57:26 PM - SPINE: Poller[0] Host[97] TH[1] NOTE: There are '39' Polling Items for this Host
03/11/2016 01:57:26 PM - SPINE: Poller[0] Host[97] TH[1] Total Time:     2 Seconds
03/11/2016 01:57:26 PM - SPINE: Poller[0] Host[97] TH[1] DEBUG: HOST COMPLETE: About to Exit Host Polling Thread Function
03/11/2016 01:57:26 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 0
03/11/2016 01:57:26 PM - SPINE: Poller[0] DEBUG: Thread Cleanup Complete
03/11/2016 01:57:26 PM - SPINE: Poller[0] DEBUG: SS[0] Script Server Shutdown Started
A TCP dump shows the following (ip address changed to 'host' and local name changed to cacti1 to protect the innocent):

Code: Select all

[root@cacti1 log]# tcpdump -s 1500 host host
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 1500 bytes
14:03:19.185252 IP cacti1.XXX.net.52359 > host.snmp:  F=r U= E=  C= GetRequest(14) 
14:03:19.185566 IP host.snmp > cacti1.XXX.net.52359:  F= U= E= 0xF50x710x7F0x000x1C0x730x850x990x8B0x00 C= Report(32)  S:snmpUsmMIB.usmMIBObjects.usmStats.usmStatsUnknownEngineIDs.0=21276
14:03:19.185644 IP cacti1.XXX.net.52359 > host.snmp:  F=apr U=cacti2 [!scoped PDU]3e_04_82_92_c5_e8_0d_04_ff_de_ae_a3_17_ed_ea_0b_5b_b5_79_c0_bb_ba_5a_ae_6c_5e_3f_1a_48_e1_06_f7_3c_f4_8b_b6_3d_bc_44_09_76_f8_36_2d_f7_3a_89_00
14:03:19.185949 IP host.snmp > cacti1.XXX.net.52359:  F=a U=cacti2 E= 0xF50x710x7F0x000x1C0x730x850x990x8B0x00 C= Report(30)  S:snmpUsmMIB.usmMIBObjects.usmStats.usmStatsNotInTimeWindows.0=81665
14:03:21.187061 IP cacti1.imhadmin.net.52359 > host.snmp:  F=apr U=cacti2 [!scoped PDU]a5_2a_7e_59_da_f1_a4_2f_27_8b_cd_72_5c_35_b9_cd_db_87_70_8e_92_4e_b8_60_9c_1a_1e_93_35_d0_89_ac_9f_80_24_5c_16_32_3c_19_ba_d7_a0_b7_82_21_51_f4
14:03:21.187385 IP host.snmp > cacti1.XXX.net.52359:  F=a U=cacti2 E= 0xF50x710x7F0x000x1C0x730x850x990x8B0x00 C= Report(30)  S:snmpUsmMIB.usmMIBObjects.usmStats.usmStatsNotInTimeWindows.0=81666
14:03:23.188929 IP cacti1.XXX.net.52359 > host.snmp:  F=apr U=cacti2 [!scoped PDU]11_68_56_28_a1_90_90_62_d7_8a_c6_e7_ec_33_8a_ed_42_57_fd_8e_9e_47_24_97_99_85_25_c6_25_7c_cc_09_e7_41_53_0e_34_5d_88_9b_2a_ed_d3_60_78_47_c8_87
14:03:23.189254 IP host.snmp > cacti1.XXX.net.52359:  F=a U=cacti2 E= 0xF50x710x7F0x000x1C0x730x850x990x8B0x00 C= Report(30)  S:snmpUsmMIB.usmMIBObjects.usmStats.usmStatsNotInTimeWindows.0=81667
14:03:25.190145 IP cacti1.XXX.net.52359 > host.snmp:  F=apr U=cacti2 [!scoped PDU]d0_42_d8_6b_be_f7_20_1a_91_a2_12_e8_f0_c0_7e_9f_be_24_30_1c_1e_08_68_64_1a_c4_6b_4e_85_61_52_8b_5a_d5_26_5d_eb_5b_31_f7_cb_22_3d_78_f5_de_cb_74
14:03:25.190473 IP host.snmp > cacti1.XXX.net.52359:  F=a U=cacti2 E= 0xF50x710x7F0x000x1C0x730x850x990x8B0x00 C= Report(30)  S:snmpUsmMIB.usmMIBObjects.usmStats.usmStatsNotInTimeWindows.0=81668
TCPdump seems okay to me.

I have been up and down these forums and I'm completely at a loss here... anyone have any ideas?

Halp!
Last edited by dampersand on Tue Mar 15, 2016 6:29 pm, edited 1 time in total.
dampersand
Posts: 10
Joined: Wed Feb 24, 2016 4:28 pm

Re: New device can't get SNMP... or can it?

Post by dampersand »

I'm sorry, additional information:

cacti is currently 8.8b
rrdtool is 1.3.8
running centos 6.7

halp!
dampersand
Posts: 10
Joined: Wed Feb 24, 2016 4:28 pm

Re: New device can't get SNMP... or can it?

Post by dampersand »

Code: Select all

[root@lax-t3x-pro-cacti1 html]# snmpgetnext -v3 -l authPriv -u cactiuser -a SHA -A <redacted> -x DES -X <redacted> HostIP .1
SNMPv2-MIB::sysDescr.0 = STRING: Arista Networks EOS version 4.14.9M running on an Arista Networks DCS-7050T-64
So, uh, SNMP ping should work, right?
User avatar
micke2k
Cacti User
Posts: 261
Joined: Wed Feb 03, 2016 3:38 pm

Re: New device can't get SNMP... or can it?

Post by micke2k »

Tried switching to snmp v2c?
dampersand
Posts: 10
Joined: Wed Feb 24, 2016 4:28 pm

Re: New device can't get SNMP... or can it?

Post by dampersand »

Thanks for the idea! I have tried this, but alas, no luck. I've even re-entered all the snmp setting in both the switch config and cacti, and double/triple checked them by snmpgetnexting from CLI. Either way, the 'bad' device is configured the same as the 'good' device... and when I switch the hostname in 'devices' to point to the 'good' device, it will happily register as up (and make new rrds).

I was going through some of the old posts again and I hit on one that I missed - the possible suggestion that a cacti's maximum that can be gotten from snmp is set too low. It looks like cacti is trying to snmpget uptime, which, when _I_ get it, is number larger than 1e9 (cacti's default max, right?). This might explain why poller gets a "U" when grabbing that OID. I thought this maximum only pertained to graphs, though, so I'm not 100% on it

I'm not at the office, but I'm going to try editing that maximum when I return. Meanwhile, any ideas are appreciated!
dampersand
Posts: 10
Joined: Wed Feb 24, 2016 4:28 pm

Re: New device can't get SNMP... or can it?

Post by dampersand »

NOPE.

So, no luck. I'm more and more sure that this uptime OID, .1.3.6.1.2.1.1.3.0 , is the problem. I can snmpget it no problem from command line, but it's way over 1e9 timeticks. When poller.php tries to get the thing, it returns "U."

Guys... it's way over ONE HUNDRED FORTY FIVE DAYS. SURELY this isn't too high to poll?!
dampersand
Posts: 10
Joined: Wed Feb 24, 2016 4:28 pm

Re: New device can't get SNMP... or can it?

Post by dampersand »

More updates for those following along:

-There are other switches on the network with higher uptime.
-Switching to cmd.php instead of spine DOES register the device as up (however, it breaks damn near everything else - every data source has a 'partial result', so this isn't really a viable option).

This suggests maybe a bug in spine?

Does anyone have any ideas? I'd rather not bug the devs with a bug report if it's something I can handle.
dampersand
Posts: 10
Joined: Wed Feb 24, 2016 4:28 pm

Re: New device can't get SNMP... or can it?

Post by dampersand »

Everything I do seems to be one step forward, two steps back. :(

I find now that switching to cmd.php and then back to spine has effectively turned off ALL graphing. What.

Graphs simply do not update. I've turned logging up to debug, and there seem to be no errors - spine correctly snmpwalks, correctly calls cacti2rrd, and cacti2rrd correctly calls rrdtool - but the rrd doesn't update.

Permissions are set correctly. Everything's being run as root. What the devil is going on here?
dampersand
Posts: 10
Joined: Wed Feb 24, 2016 4:28 pm

Re: New device can't get SNMP... or can it?

Post by dampersand »

Since this is SUCH a popular thread, some updates:

The 'lol nothing will graph' turned out to be a timezone issue - the guy that set this up originally was on the East Coast, so the php.ini was set to his timezone. When I jumped between cmd.php and spine, a everything was updated according to east coast timezones for a couple cycles, then went back to west coast timezones. This caused a bunch of "I'm not allowed to update the past" errors. That's done.

As for the current issue, I'm still stuck, and would absolutely love another human's input. I've updated spine to 0.88b, and I've been adding little debug phrases to determine where the error occurs. It turns in snmp.c's snmp_get function, there exists a call to net-snmp in the form of:

Code: Select all

status = snmp_sess_synch_response(current_host->snmp_session, pdu, &response);
Before this call, current_host->snmp_status comes back as 0; after this call, current_host->snmp_status comes back as 2.

I've been crawling net-snmp to figure out where this error number came from in the hopes that it would give me some hints, but the only error list I can find for net-snmp involves negative numbers (and I don't see a switch from signed to unsigned ints here anywhere). Trouble is, I'm not very good with c, and I absolutely can't figure out how snmp_sess_synch_response is returning 2 instead of 1.

...anyone?
dampersand
Posts: 10
Joined: Wed Feb 24, 2016 4:28 pm

Re: New device can't get SNMP... or can it?

Post by dampersand »

OH MAN IT'S GETTING WORSE.

After all the various tweaks I've made, now snmpv2c DOES work. Unfortunately this isn't really acceptable, since we're spitting traffic over the internet and encryption is required.

So why would snmpv3 work sometimes, but not other times?
dampersand
Posts: 10
Joined: Wed Feb 24, 2016 4:28 pm

[SOLVED] Re: New device can't get SNMP... or can it?

Post by dampersand »

Alright.

Barking up the snmpv2 tree:

I added an old community string to the switch config.

Spine started polling correctly on SNMPV3. Now, I'm pretty sure snmpv3 doesn't NEED community strings, but whatever.

Just to check, I REMOVED the community string from the switch config.

SPINE CONTINUED TO WORK CORRECTLY.

That's it, I'm out. Abandon thread.
Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest