I have a Linux-based Cacti box collecting data from three Linux-based boxes and three Windows 2003 boxes.
One of the Windows 2003 hosts (Melbourne) suddenly shows "Down" and no data is being graphed anymore. However, Melbourne is online and working perfectly. When clicking the name of the device to view its settings, the heading at the top does produce SNMP data:
Code: Select all
System: Hardware: x86 Family 15 Model 0 Stepping 7 AT/AT COMPATIBLE - Software: Windows Version 5.2 (Build 3790 Uniprocessor Free)
Uptime: 85134 (0 days, 0 hours, 14 minutes)
Hostname: MELBOURNE
Location:
Contact: Matthew Clark
Code: Select all
root@Woolhara:~# snmpwalk -v 2c -c public melbourne
SNMPv2-MIB::sysDescr.0 = STRING: Hardware: x86 Family 15 Model 0 Stepping 7 AT/AT COMPATIBLE - Software: Windows Version 5.2 (Build 3790 Uniprocessor Free)
SNMPv2-MIB::sysObjectID.0 = OID: SNMPv2-SMI::enterprises.311.1.1.3.1.2
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (136295) 0:22:42.95
SNMPv2-MIB::sysContact.0 = STRING: Matthew Clark
SNMPv2-MIB::sysName.0 = STRING: MELBOURNE
SNMPv2-MIB::sysLocation.0 = STRING:
SNMPv2-MIB::sysServices.0 = INTEGER: 79
<snip>
I even deleted the entire device and all associated graphs from Cacti and re-created from scratch. Still down. This would seem to isolate the problem to Melbourne itself, but that's contradictory to the fact that snmpwalk returns (lots of) data.
Yet, all other devices in Cacti are working -- no issues at all.
So why would Cacti think a host is down, and what could I possibly be missing? What checks can I execute that might narrow down the possibilities? I am highly proficient with PHP, Linux, and Cacti, but I am not knowledgable enough about the internals of Cacti to go rooting around in the source without a flashlight...