monitor successful but host still down
Moderators: Developers, Moderators
monitor successful but host still down
v0.8.7a with several hosts up and running OK. But having trouble getting POWERWARE UPS units to report 'up' in the devices. the SNMP shows successful values coming back, yet the device status stays 'down.' Same deal with TCP ping (TCP ping success) but device reports 'down' status.
this is consistent with all three Powerware UPS devices we have in production.
Any thoughts would be most appreciated.
this is consistent with all three Powerware UPS devices we have in production.
Any thoughts would be most appreciated.
snmp response
when set to SNMP, I see the host information (contact, etc) in the upper right of the device information (and see the traffic by sniffing), but the device still shows down.
when set to 'tcp ping' it shows 'tcp success' but host still down.
right now, it's set to SNMP.
per your request
snmpwalk -c [mycommunity] -v 1 192.168.2.19 system
Timeout: No Response from 192.168.2.19
ah yes...that doesn't look good...
what 'r u thinking?
when set to 'tcp ping' it shows 'tcp success' but host still down.
right now, it's set to SNMP.
per your request
snmpwalk -c [mycommunity] -v 1 192.168.2.19 system
Timeout: No Response from 192.168.2.19
ah yes...that doesn't look good...
what 'r u thinking?
in troubleshooting today I tried that command again this morning and am getting a response now...
**
user@syslog:~# snmpwalk -c [mycommunity] -v1 192.168.2.19 system
SNMPv2-MIB::sysDescr.0 = STRING: ConnectUPS Web/SNMP Card V3.11
SNMPv2-MIB::sysObjectID.0 = OID: SNMPv2-SMI::enterprises.534.1
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (2792252309) 323 days, 4:15:23.09
SNMPv2-MIB::sysContact.0 = STRING: User Name
SNMPv2-MIB::sysName.0 = STRING: ConnectUPS Web/SNMP Card
SNMPv2-MIB::sysLocation.0 = STRING: Some, Where
SNMPv2-MIB::sysServices.0 = INTEGER: 72
**
only the host is still 'Down' with SNMP set as the monitor...I was reading through most of the information on your link, but not real clear to me what my next troubleshooting steps should be here.
wild!?
Any advice?
With sincere appreciation.
**
user@syslog:~# snmpwalk -c [mycommunity] -v1 192.168.2.19 system
SNMPv2-MIB::sysDescr.0 = STRING: ConnectUPS Web/SNMP Card V3.11
SNMPv2-MIB::sysObjectID.0 = OID: SNMPv2-SMI::enterprises.534.1
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (2792252309) 323 days, 4:15:23.09
SNMPv2-MIB::sysContact.0 = STRING: User Name
SNMPv2-MIB::sysName.0 = STRING: ConnectUPS Web/SNMP Card
SNMPv2-MIB::sysLocation.0 = STRING: Some, Where
SNMPv2-MIB::sysServices.0 = INTEGER: 72
**
only the host is still 'Down' with SNMP set as the monitor...I was reading through most of the information on your link, but not real clear to me what my next troubleshooting steps should be here.
wild!?
Any advice?
With sincere appreciation.
more information
to further troubleshoot I started capturing packets and see this:
15:50:17.273151 192.168.1.45.38000 > 192.168.2.19.161: C=[mycommunity] GetNextRequest(21) .0.1 (DF)
15:50:17.273151 192.168.1.45.38000 > 192.168.2.19.161: C=[mycommunity] GetNextRequest(21) .0.1 (DF)
but notice, there is no response from the UPS back to the cacti server (.1.45)...
so, I manually ran this command.
user@syslog:~# snmpwalk -c [mycommunity] -v1 192.168.2.19 .0.1
Timeout: No Response from 192.168.2.19
HOWEVER, the 'system' ones till works ok.
user@syslog:~# snmpwalk -c [my community] -v1 192.168.2.19 system
SNMPv2-MIB::sysDescr.0 = STRING: ConnectUPS Web/SNMP Card V3.11
SNMPv2-MIB::sysObjectID.0 = OID: SNMPv2-SMI::enterprises.534.1
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (2792874556) 323 days, 5:59:05.56
SNMPv2-MIB::sysContact.0 = STRING: User Name
SNMPv2-MIB::sysName.0 = STRING: ConnectUPS Web/SNMP Card
SNMPv2-MIB::sysLocation.0 = STRING: Some, Where
SNMPv2-MIB::sysServices.0 = INTEGER: 72
So, at this point, I think it's related to the ".0.1 (DF)" tag...where is it getting this from? What is it 'walking' for that value? What is that value?
make sense?
15:50:17.273151 192.168.1.45.38000 > 192.168.2.19.161: C=[mycommunity] GetNextRequest(21) .0.1 (DF)
15:50:17.273151 192.168.1.45.38000 > 192.168.2.19.161: C=[mycommunity] GetNextRequest(21) .0.1 (DF)
but notice, there is no response from the UPS back to the cacti server (.1.45)...
so, I manually ran this command.
user@syslog:~# snmpwalk -c [mycommunity] -v1 192.168.2.19 .0.1
Timeout: No Response from 192.168.2.19
HOWEVER, the 'system' ones till works ok.
user@syslog:~# snmpwalk -c [my community] -v1 192.168.2.19 system
SNMPv2-MIB::sysDescr.0 = STRING: ConnectUPS Web/SNMP Card V3.11
SNMPv2-MIB::sysObjectID.0 = OID: SNMPv2-SMI::enterprises.534.1
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (2792874556) 323 days, 5:59:05.56
SNMPv2-MIB::sysContact.0 = STRING: User Name
SNMPv2-MIB::sysName.0 = STRING: ConnectUPS Web/SNMP Card
SNMPv2-MIB::sysLocation.0 = STRING: Some, Where
SNMPv2-MIB::sysServices.0 = INTEGER: 72
So, at this point, I think it's related to the ".0.1 (DF)" tag...where is it getting this from? What is it 'walking' for that value? What is that value?
make sense?
I'm having the same problem. A single one of my hosts is complaining that it's down, so graphs are not populating. I can both ping and run SNMP walks from the cacti server, but Cacti still shows it as down.
This is in Cacti 0.8.7b running on FreeBSD 7.0, Apache 2.2, PHP 5.
I currently have the "Downed Host Detection" set to SNMP. The top left corner (SNMP Information) populates, but the device still shows down. If I set the method to Ping, the Ping Results show "Cannot connect to host" - which I cannot replicate manually.
I've even tried wiping the host out of cacti and re-adding it, to no avail.
-HKS
This is in Cacti 0.8.7b running on FreeBSD 7.0, Apache 2.2, PHP 5.
I currently have the "Downed Host Detection" set to SNMP. The top left corner (SNMP Information) populates, but the device still shows down. If I set the method to Ping, the Ping Results show "Cannot connect to host" - which I cannot replicate manually.
I've even tried wiping the host out of cacti and re-adding it, to no avail.
-HKS
user@syslog:~# snmpgetnext -c [mycommunity]-v 1 192.168.2.19 .0.1
Timeout: No Response from 192.168.2.19.
to validate the command is ok:
user@syslog:~# snmpgetnext -c [mycommunity] -v 1 192.168.2.19 sysUpTime
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (2792984186) 323 days, 6:17:21.86
MORE INFORMATION
I just ran GETIF on the host and walked the MIB and, sure enough, I do not see a .0.1 value at all.
.0 = ccitt
.0.0 = ccitt.zeroDotZero
.0.0 = ccit.nullOID
which is very strange, since I would expect the value .nullOID to be the .0.1 since it's beneath the zeroDotZero entry.
Any thoughts?
Querying .0.0 yields the same 'no response'
Timeout: No Response from 192.168.2.19.
to validate the command is ok:
user@syslog:~# snmpgetnext -c [mycommunity] -v 1 192.168.2.19 sysUpTime
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (2792984186) 323 days, 6:17:21.86
MORE INFORMATION
I just ran GETIF on the host and walked the MIB and, sure enough, I do not see a .0.1 value at all.
.0 = ccitt
.0.0 = ccitt.zeroDotZero
.0.0 = ccit.nullOID
which is very strange, since I would expect the value .nullOID to be the .0.1 since it's beneath the zeroDotZero entry.
Any thoughts?
Querying .0.0 yields the same 'no response'
I think my statement earlier, about .0.0 not existing, may have been misplaced...continuing to dive into this, I noticed that the POWERWARE UPS was not responding to the GETIF application's SNMPGET request, just as it was not responding to the CACTI's request.
The Cacti request looks like this:
14:28:23.648114 192.168.1.45.4586 > 192.168.2.19.161: C= GetNextRequest(23) .0.1
Where the GetIf request looks like this:
14:28:23.648114 192.168.1.49.4586 > 192.168.2.19.161: C= GetNextRequest(23) .0.0
In the end, both result in the POWERWARE not responding, so my statement earlier about the .0.1 being 'nonexistent' is premature. That is, it may or may not exist on the POWERWARE UPS...all I know is when asked by either application, it does not reply, where every other devices we have replies fine with the .0.1 request.
So what is the .0.1 is still a pending question...here is an observation, however, from another client:
"user@syslog:~# snmpwalk -c [mycommunity] -v1 192.168.12.1 .0.1 -O n
.1.3.6.1.2.1.1.1.0 = STRING: Intermec Technologies AP"
So, when I set the OID value to .0.1, the client respondes with the .1.3.6.1.2.1.1.1.0 value.
Just an observation...I'm not a SNMP master here (obviously).
As for Gandalf's request, I have been trying to isolate what area of ping_snmp function in I need to massage and am at a loss...my PHP version is 5.2, so the OID variable should get set (always) to .1.3.6.1.2.1.1.3.0, but I can't isolate where the call gets made to .0.1 in the snmp query.
Could use some more advice. My development background here is weak, but I'm trying!
Sincere appreciation on this. I'm excited to be getting into the details of this problem!
The Cacti request looks like this:
14:28:23.648114 192.168.1.45.4586 > 192.168.2.19.161: C= GetNextRequest(23) .0.1
Where the GetIf request looks like this:
14:28:23.648114 192.168.1.49.4586 > 192.168.2.19.161: C= GetNextRequest(23) .0.0
In the end, both result in the POWERWARE not responding, so my statement earlier about the .0.1 being 'nonexistent' is premature. That is, it may or may not exist on the POWERWARE UPS...all I know is when asked by either application, it does not reply, where every other devices we have replies fine with the .0.1 request.
So what is the .0.1 is still a pending question...here is an observation, however, from another client:
"user@syslog:~# snmpwalk -c [mycommunity] -v1 192.168.12.1 .0.1 -O n
.1.3.6.1.2.1.1.1.0 = STRING: Intermec Technologies AP"
So, when I set the OID value to .0.1, the client respondes with the .1.3.6.1.2.1.1.1.0 value.
Just an observation...I'm not a SNMP master here (obviously).
As for Gandalf's request, I have been trying to isolate what area of ping_snmp function in I need to massage and am at a loss...my PHP version is 5.2, so the OID variable should get set (always) to .1.3.6.1.2.1.1.3.0, but I can't isolate where the call gets made to .0.1 in the snmp query.
Could use some more advice. My development background here is weak, but I'm trying!
Sincere appreciation on this. I'm excited to be getting into the details of this problem!
even more strange behavir
what's strange is the unit stays 'down' even when I change the monitor. When I change it to TCP port 23, on a tcpdump, I immediately see a success:
16:56:53.948099 192.168.1.45.37452 > 192.168.2.19.23: S 1791524515:1791524515(0) win 5840 <mss 1460,sackOK,timestamp 499136484 0,nop,wscale 5> (DF)
16:56:53.949448 192.168.2.19.23 > 192.168.1.45.37452: S 1719005:1719005(0) ack 1791524516 win 8192 <mss 1440>
16:56:53.964901 192.168.1.45.37452 > 192.168.2.19.23: . ack 1 win 5840 (DF)
16:56:53.964917 192.168.1.45.37452 > 192.168.2.19.23: F 1:1(0) ack 1 win 5840 (DF)
16:56:53.979112 192.168.2.19.23 > 192.168.1.45.37452: FP 1:266(265) ack 2 win 8190
16:56:53.979928 192.168.2.19.23 > 192.168.1.45.37452: F 267:267(0) ack 2 win 8190
16:56:53.980962 192.168.2.19.23 > 192.168.1.45.37452: F 268:268(0) ack 2 win 8190
16:56:53.989182 192.168.1.45.37452 > 192.168.2.19.23: R 1791524517:1791524517(0) win 0 (DF)
however, if I leave it alone for the normal 5s period, I see this:
16:55:15.834731 192.168.1.45.56660 > 192.168.2.19.33439: S 1700466563:1700466563(0) win 5840 <mss 1460,sackOK,timestamp 499126671 0,nop,wscale 5> (DF)
port 33439!? That's not the port i told it to monitor!?
This behaviour is so bizzare!
16:56:53.948099 192.168.1.45.37452 > 192.168.2.19.23: S 1791524515:1791524515(0) win 5840 <mss 1460,sackOK,timestamp 499136484 0,nop,wscale 5> (DF)
16:56:53.949448 192.168.2.19.23 > 192.168.1.45.37452: S 1719005:1719005(0) ack 1791524516 win 8192 <mss 1440>
16:56:53.964901 192.168.1.45.37452 > 192.168.2.19.23: . ack 1 win 5840 (DF)
16:56:53.964917 192.168.1.45.37452 > 192.168.2.19.23: F 1:1(0) ack 1 win 5840 (DF)
16:56:53.979112 192.168.2.19.23 > 192.168.1.45.37452: FP 1:266(265) ack 2 win 8190
16:56:53.979928 192.168.2.19.23 > 192.168.1.45.37452: F 267:267(0) ack 2 win 8190
16:56:53.980962 192.168.2.19.23 > 192.168.1.45.37452: F 268:268(0) ack 2 win 8190
16:56:53.989182 192.168.1.45.37452 > 192.168.2.19.23: R 1791524517:1791524517(0) win 0 (DF)
however, if I leave it alone for the normal 5s period, I see this:
16:55:15.834731 192.168.1.45.56660 > 192.168.2.19.33439: S 1700466563:1700466563(0) win 5840 <mss 1460,sackOK,timestamp 499126671 0,nop,wscale 5> (DF)
port 33439!? That's not the port i told it to monitor!?
This behaviour is so bizzare!
Who is online
Users browsing this forum: No registered users and 2 guests