monitor successful but host still down

Post general support questions here that do not specifically fall into the Linux or Windows categories.

Moderators: Developers, Moderators

krypsys
Posts: 44
Joined: Thu Jul 06, 2006 4:30 pm

monitor successful but host still down

Post by krypsys »

v0.8.7a with several hosts up and running OK. But having trouble getting POWERWARE UPS units to report 'up' in the devices. the SNMP shows successful values coming back, yet the device status stays 'down.' Same deal with TCP ping (TCP ping success) but device reports 'down' status.

this is consistent with all three Powerware UPS devices we have in production.

Any thoughts would be most appreciated.
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Post by gandalf »

What is the downed host detection method used for that device? If it's SNMP, please do a "snmpwalk -c ... -v 1 <target> system" and post results
Reinhard
krypsys
Posts: 44
Joined: Thu Jul 06, 2006 4:30 pm

snmp response

Post by krypsys »

when set to SNMP, I see the host information (contact, etc) in the upper right of the device information (and see the traffic by sniffing), but the device still shows down.

when set to 'tcp ping' it shows 'tcp success' but host still down.

right now, it's set to SNMP.

per your request
snmpwalk -c [mycommunity] -v 1 192.168.2.19 system
Timeout: No Response from 192.168.2.19

ah yes...that doesn't look good...
what 'r u thinking?
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Post by gandalf »

Not good, yes. Target is not responding, so cacti won't graph.
Please see first link of my sig
Reinhard
krypsys
Posts: 44
Joined: Thu Jul 06, 2006 4:30 pm

Post by krypsys »

in troubleshooting today I tried that command again this morning and am getting a response now...

**
user@syslog:~# snmpwalk -c [mycommunity] -v1 192.168.2.19 system
SNMPv2-MIB::sysDescr.0 = STRING: ConnectUPS Web/SNMP Card V3.11
SNMPv2-MIB::sysObjectID.0 = OID: SNMPv2-SMI::enterprises.534.1
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (2792252309) 323 days, 4:15:23.09
SNMPv2-MIB::sysContact.0 = STRING: User Name
SNMPv2-MIB::sysName.0 = STRING: ConnectUPS Web/SNMP Card
SNMPv2-MIB::sysLocation.0 = STRING: Some, Where
SNMPv2-MIB::sysServices.0 = INTEGER: 72

**

only the host is still 'Down' with SNMP set as the monitor...I was reading through most of the information on your link, but not real clear to me what my next troubleshooting steps should be here.

wild!?

Any advice?
With sincere appreciation.
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Post by gandalf »

Which downed host detection used, please?
Reinhard
krypsys
Posts: 44
Joined: Thu Jul 06, 2006 4:30 pm

Post by krypsys »

SNMP, amigo.
Right now its set to SNMP.
I tried TCP PING, yesterday, which came back 'successful' on the GUI, but the device still showed 'down'

grrrr...
krypsys
Posts: 44
Joined: Thu Jul 06, 2006 4:30 pm

more information

Post by krypsys »

to further troubleshoot I started capturing packets and see this:

15:50:17.273151 192.168.1.45.38000 > 192.168.2.19.161: C=[mycommunity] GetNextRequest(21) .0.1 (DF)
15:50:17.273151 192.168.1.45.38000 > 192.168.2.19.161: C=[mycommunity] GetNextRequest(21) .0.1 (DF)

but notice, there is no response from the UPS back to the cacti server (.1.45)...

so, I manually ran this command.

user@syslog:~# snmpwalk -c [mycommunity] -v1 192.168.2.19 .0.1
Timeout: No Response from 192.168.2.19

HOWEVER, the 'system' ones till works ok.
user@syslog:~# snmpwalk -c [my community] -v1 192.168.2.19 system
SNMPv2-MIB::sysDescr.0 = STRING: ConnectUPS Web/SNMP Card V3.11
SNMPv2-MIB::sysObjectID.0 = OID: SNMPv2-SMI::enterprises.534.1
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (2792874556) 323 days, 5:59:05.56
SNMPv2-MIB::sysContact.0 = STRING: User Name
SNMPv2-MIB::sysName.0 = STRING: ConnectUPS Web/SNMP Card
SNMPv2-MIB::sysLocation.0 = STRING: Some, Where
SNMPv2-MIB::sysServices.0 = INTEGER: 72

So, at this point, I think it's related to the ".0.1 (DF)" tag...where is it getting this from? What is it 'walking' for that value? What is that value?

make sense?
hkspvt
Posts: 2
Joined: Wed Jun 04, 2008 3:50 pm

Post by hkspvt »

I'm having the same problem. A single one of my hosts is complaining that it's down, so graphs are not populating. I can both ping and run SNMP walks from the cacti server, but Cacti still shows it as down.

This is in Cacti 0.8.7b running on FreeBSD 7.0, Apache 2.2, PHP 5.

I currently have the "Downed Host Detection" set to SNMP. The top left corner (SNMP Information) populates, but the device still shows down. If I set the method to Ping, the Ping Results show "Cannot connect to host" - which I cannot replicate manually.

I've even tried wiping the host out of cacti and re-adding it, to no avail.

-HKS
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Post by gandalf »

snmgetnext for ".0.1" should return system information. Please run snmpgetnext for OID ".0.1" manually from cli and return results
Reinhard
krypsys
Posts: 44
Joined: Thu Jul 06, 2006 4:30 pm

Post by krypsys »

user@syslog:~# snmpgetnext -c [mycommunity]-v 1 192.168.2.19 .0.1
Timeout: No Response from 192.168.2.19.

to validate the command is ok:
user@syslog:~# snmpgetnext -c [mycommunity] -v 1 192.168.2.19 sysUpTime
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (2792984186) 323 days, 6:17:21.86

MORE INFORMATION
I just ran GETIF on the host and walked the MIB and, sure enough, I do not see a .0.1 value at all.

.0 = ccitt
.0.0 = ccitt.zeroDotZero
.0.0 = ccit.nullOID

which is very strange, since I would expect the value .nullOID to be the .0.1 since it's beneath the zeroDotZero entry.

Any thoughts?

Querying .0.0 yields the same 'no response'
hkspvt
Posts: 2
Joined: Wed Jun 04, 2008 3:50 pm

Post by hkspvt »

.0.xxxx are assigned by the ITU-T and are not generally used in the context of the Internet. Is there a prefix we're missing somewhere?

-HKS
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Post by gandalf »

Please find module ./lib/ping.php and locate function ping_snmp. You may want to change the magic numbers in there
Reinhard
krypsys
Posts: 44
Joined: Thu Jul 06, 2006 4:30 pm

Post by krypsys »

I think my statement earlier, about .0.0 not existing, may have been misplaced...continuing to dive into this, I noticed that the POWERWARE UPS was not responding to the GETIF application's SNMPGET request, just as it was not responding to the CACTI's request.

The Cacti request looks like this:
14:28:23.648114 192.168.1.45.4586 > 192.168.2.19.161: C= GetNextRequest(23) .0.1

Where the GetIf request looks like this:
14:28:23.648114 192.168.1.49.4586 > 192.168.2.19.161: C= GetNextRequest(23) .0.0

In the end, both result in the POWERWARE not responding, so my statement earlier about the .0.1 being 'nonexistent' is premature. That is, it may or may not exist on the POWERWARE UPS...all I know is when asked by either application, it does not reply, where every other devices we have replies fine with the .0.1 request.

So what is the .0.1 is still a pending question...here is an observation, however, from another client:

"user@syslog:~# snmpwalk -c [mycommunity] -v1 192.168.12.1 .0.1 -O n
.1.3.6.1.2.1.1.1.0 = STRING: Intermec Technologies AP"

So, when I set the OID value to .0.1, the client respondes with the .1.3.6.1.2.1.1.1.0 value.

Just an observation...I'm not a SNMP master here (obviously).

As for Gandalf's request, I have been trying to isolate what area of ping_snmp function in I need to massage and am at a loss...my PHP version is 5.2, so the OID variable should get set (always) to .1.3.6.1.2.1.1.3.0, but I can't isolate where the call gets made to .0.1 in the snmp query.

Could use some more advice. My development background here is weak, but I'm trying!

Sincere appreciation on this. I'm excited to be getting into the details of this problem!
krypsys
Posts: 44
Joined: Thu Jul 06, 2006 4:30 pm

even more strange behavir

Post by krypsys »

what's strange is the unit stays 'down' even when I change the monitor. When I change it to TCP port 23, on a tcpdump, I immediately see a success:

16:56:53.948099 192.168.1.45.37452 > 192.168.2.19.23: S 1791524515:1791524515(0) win 5840 <mss 1460,sackOK,timestamp 499136484 0,nop,wscale 5> (DF)
16:56:53.949448 192.168.2.19.23 > 192.168.1.45.37452: S 1719005:1719005(0) ack 1791524516 win 8192 <mss 1440>
16:56:53.964901 192.168.1.45.37452 > 192.168.2.19.23: . ack 1 win 5840 (DF)
16:56:53.964917 192.168.1.45.37452 > 192.168.2.19.23: F 1:1(0) ack 1 win 5840 (DF)
16:56:53.979112 192.168.2.19.23 > 192.168.1.45.37452: FP 1:266(265) ack 2 win 8190
16:56:53.979928 192.168.2.19.23 > 192.168.1.45.37452: F 267:267(0) ack 2 win 8190
16:56:53.980962 192.168.2.19.23 > 192.168.1.45.37452: F 268:268(0) ack 2 win 8190
16:56:53.989182 192.168.1.45.37452 > 192.168.2.19.23: R 1791524517:1791524517(0) win 0 (DF)

however, if I leave it alone for the normal 5s period, I see this:

16:55:15.834731 192.168.1.45.56660 > 192.168.2.19.33439: S 1700466563:1700466563(0) win 5840 <mss 1460,sackOK,timestamp 499126671 0,nop,wscale 5> (DF)

port 33439!? That's not the port i told it to monitor!?

This behaviour is so bizzare!
Post Reply

Who is online

Users browsing this forum: No registered users and 4 guests