I inherited a very messy cactiEZ system a few weeks back, and I've been up and down the documentation (and this forum) for eons getting everything we need sussed out. Thanks to the very useful information on this forum (and the multitudes of people having similar problems to mine), I have managed to set our system up ALMOST 100% the way it needs to be.
I am stuck on ONE final problem. This is a problem that I've seen quite a few times on the forums, but somehow nothing seems to work for me:
I have a new device that needs graphed, but the darn thing won't play ball - it shows as 'down' in the device list (unless I use ICMP ping, more on that in a moment), and will not populate any of the graphs assigned to it. The worst part is, it is set up in cacti EXACTLY the same as another device of the same type, name, and location... and the OTHER device works flawlessly. The only difference that I know of is that the working device was created before I got here, and the nonworking device is created new.
Here's a list of information:
1. Device shows as 'down' in device list while using "SNMP Uptime" as its Downed Device Detection. When I switch to Ping Or SNMP, the device comes 'up' in the device list (but graphs still won't populate).
2. SNMP information on the 'device' page properly populates.
3. Running the 'verbose query' on SNMP - Interface Statistics returns a full list of information.
4. I can manually SNMPwalk from the server, run snmpgetnext on .1 and .1.3 as well.
5. I have double- and triple-checked cacti's SNMP authorization settings. They match. They also match on the other device, which is working.
6. The switch configs for both devices are identical, save for obvious things (their IP addresses, etc)
7. poller.php is being run as root. I can run the darn thing manually, too, no problem... but it doesn't help.
8. The rra directory is set to allow apache:apache (our cacti user)
As I mention in point 1, when I use ICMP as downed detection, the device shows as 'up' and moves forward. The debug loglevel of cacti gives the following information pertaining to host 97 (the culprit)
Code: Select all
03/11/2016 01:50:22 PM - SPINE: Poller[0] Host[97] DEBUG: Entering ICMP Ping
03/11/2016 01:50:22 PM - SPINE: Poller[0] Host[97] DEBUG: ICMP Host Alive, Try Count:1, Time:0.1950 ms
03/11/2016 01:50:22 PM - SPINE: Poller[0] Host[97] PING Result: ICMP: Host is Alive
03/11/2016 01:50:22 PM - SPINE: Poller[0] Host[97] SNMP Result: SNMP not performed due to setting or ping result
03/11/2016 01:50:22 PM - SPINE: Poller[0] Host[97] TH[1] RECACHE: Processing 1 items in the auto reindex cache for '10.129.0.154'
...Some artifact data from weathermap, the noisiest plugin on the planet...
03/11/2016 01:50:30 PM - SPINE: Poller[0] Host[97] TH[1] Recache DataQuery[3] OID: .1.3.6.1.2.1.1.3.0, output: U
03/11/2016 01:50:30 PM - SPINE: Poller[0] Host[97] TH[1] NOTE: There are '39' Polling Items for this Host
03/11/2016 01:50:30 PM - SPINE: Poller[0] Host[97] TH[1] Total Time: 8 Seconds
03/11/2016 01:50:30 PM - SPINE: Poller[0] Host[97] TH[1] DEBUG: HOST COMPLETE: About to Exit Host Polling Thread Function
03/11/2016 01:50:30 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 0
03/11/2016 01:50:30 PM - SPINE: Poller[0] DEBUG: Thread Cleanup Complete
03/11/2016 01:50:30 PM - SPINE: Poller[0] DEBUG: SS[0] Script Server Shutdown Started
03/11/2016 04:50:30 PM - PHPSVR: Poller[0] DEBUG: PHP Script Server Shutdown request received, exiting
Switching Downed Device Detection to "SNMP Only" returns the following in logs:
Code: Select all
03/11/2016 09:32:20 PM - SPINE: Poller[0] Host[97] DEBUG: Entering SNMP Ping
03/11/2016 01:57:26 PM - SPINE: Poller[0] Host[97] SNMP Ping Error: Unknown error: 2
03/11/2016 01:57:26 PM - SPINE: Poller[0] Host[97] SNMP Result: Host did not respond to SNMP
03/11/2016 01:57:26 PM - SPINE: Poller[0] Host[97] TH[1] NOTE: There are '39' Polling Items for this Host
03/11/2016 01:57:26 PM - SPINE: Poller[0] Host[97] TH[1] Total Time: 2 Seconds
03/11/2016 01:57:26 PM - SPINE: Poller[0] Host[97] TH[1] DEBUG: HOST COMPLETE: About to Exit Host Polling Thread Function
03/11/2016 01:57:26 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 0
03/11/2016 01:57:26 PM - SPINE: Poller[0] DEBUG: Thread Cleanup Complete
03/11/2016 01:57:26 PM - SPINE: Poller[0] DEBUG: SS[0] Script Server Shutdown Started
Code: Select all
[root@cacti1 log]# tcpdump -s 1500 host host
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 1500 bytes
14:03:19.185252 IP cacti1.XXX.net.52359 > host.snmp: F=r U= E= C= GetRequest(14)
14:03:19.185566 IP host.snmp > cacti1.XXX.net.52359: F= U= E= 0xF50x710x7F0x000x1C0x730x850x990x8B0x00 C= Report(32) S:snmpUsmMIB.usmMIBObjects.usmStats.usmStatsUnknownEngineIDs.0=21276
14:03:19.185644 IP cacti1.XXX.net.52359 > host.snmp: F=apr U=cacti2 [!scoped PDU]3e_04_82_92_c5_e8_0d_04_ff_de_ae_a3_17_ed_ea_0b_5b_b5_79_c0_bb_ba_5a_ae_6c_5e_3f_1a_48_e1_06_f7_3c_f4_8b_b6_3d_bc_44_09_76_f8_36_2d_f7_3a_89_00
14:03:19.185949 IP host.snmp > cacti1.XXX.net.52359: F=a U=cacti2 E= 0xF50x710x7F0x000x1C0x730x850x990x8B0x00 C= Report(30) S:snmpUsmMIB.usmMIBObjects.usmStats.usmStatsNotInTimeWindows.0=81665
14:03:21.187061 IP cacti1.imhadmin.net.52359 > host.snmp: F=apr U=cacti2 [!scoped PDU]a5_2a_7e_59_da_f1_a4_2f_27_8b_cd_72_5c_35_b9_cd_db_87_70_8e_92_4e_b8_60_9c_1a_1e_93_35_d0_89_ac_9f_80_24_5c_16_32_3c_19_ba_d7_a0_b7_82_21_51_f4
14:03:21.187385 IP host.snmp > cacti1.XXX.net.52359: F=a U=cacti2 E= 0xF50x710x7F0x000x1C0x730x850x990x8B0x00 C= Report(30) S:snmpUsmMIB.usmMIBObjects.usmStats.usmStatsNotInTimeWindows.0=81666
14:03:23.188929 IP cacti1.XXX.net.52359 > host.snmp: F=apr U=cacti2 [!scoped PDU]11_68_56_28_a1_90_90_62_d7_8a_c6_e7_ec_33_8a_ed_42_57_fd_8e_9e_47_24_97_99_85_25_c6_25_7c_cc_09_e7_41_53_0e_34_5d_88_9b_2a_ed_d3_60_78_47_c8_87
14:03:23.189254 IP host.snmp > cacti1.XXX.net.52359: F=a U=cacti2 E= 0xF50x710x7F0x000x1C0x730x850x990x8B0x00 C= Report(30) S:snmpUsmMIB.usmMIBObjects.usmStats.usmStatsNotInTimeWindows.0=81667
14:03:25.190145 IP cacti1.XXX.net.52359 > host.snmp: F=apr U=cacti2 [!scoped PDU]d0_42_d8_6b_be_f7_20_1a_91_a2_12_e8_f0_c0_7e_9f_be_24_30_1c_1e_08_68_64_1a_c4_6b_4e_85_61_52_8b_5a_d5_26_5d_eb_5b_31_f7_cb_22_3d_78_f5_de_cb_74
14:03:25.190473 IP host.snmp > cacti1.XXX.net.52359: F=a U=cacti2 E= 0xF50x710x7F0x000x1C0x730x850x990x8B0x00 C= Report(30) S:snmpUsmMIB.usmMIBObjects.usmStats.usmStatsNotInTimeWindows.0=81668
I have been up and down these forums and I'm completely at a loss here... anyone have any ideas?
Halp!