Down Host Detection
Moderators: Developers, Moderators
Down Host Detection
Greetings,
I have looked through the posts, and have not found someone with this problem that did not get resolved.
I have a host, that does not support SNMP , I am pulling stats off it with a Net::Telnet script. I can ping the device from my cacti server as the cacti user:
$ ping 10.5.110.25
PING 10.5.110.25 (10.5.110.25) 56(84) bytes of data.
64 bytes from 10.5.110.25: icmp_seq=0 ttl=63 time=1.99 ms
64 bytes from 10.5.110.25: icmp_seq=1 ttl=63 time=1.51 ms
64 bytes from 10.5.110.25: icmp_seq=2 ttl=63 time=1.35 ms
--- 10.5.110.25 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 1.352/1.622/1.995/0.272 ms, pipe 2
My Downed Host Detection setting is SNMP and PING, Ping Type = ICMP.
When I change Detection to PING only. The host does not come up, in fact, I lose about 40 of my other hosts, with this in the error log:
ERROR: HOST EVENT: Host is DOWN Message: UDP: Ping timed out
?If I am set for ICMP ping, why is it trying a UDP ping?
I wait for a few poll periods, then change the Ping Type to UDP. Nothing changes. When I go back to Detection = SNMP and PING, Keeping PING Type= UDP, and I end up back where I started, with all my hosts up:
01/09/2007 05:55:15 PM - CACTID: Poller[0] Host[314] NOTICE: HOST EVENT: Host Returned from DOWN State
The host I have that does not support SNMP, stays down through this whole process.
I tried changing to cmd.php for testing, but it takes too long to run my poll, and basically wrecks everything.
I see this in the debug log:
e = '2007-01-04 17:31:00', status_last_error = 'SNMP not performed due to setting or ping result., Cannot connect to host', min_time = '0.65053', max_time = '0.93150', cur_time = '0.93150', avg_time = '0.79102', total_polls = '1695', failed_polls = '1693', availability = '0.11799410029499' where hostname = '10.5.110.25'"
01/09/2007 12:15:07 PM - CMDPHP: Poller[0] DEBUG: SQL Exec: "update host set status = '1', status_event_count = '1378', status_fail_date = '2007-01-04 17:38:00', status_rec_date = '2007-01-04 17:31:00', status_last_error = 'SNMP not performed due to setting or ping result., Cannot connect to host', min_time = '0.65053', max_time = '0.93150', cur_time = '0.93150', avg_time = '0.79102', total_polls = '1696', failed_polls = '1694', availability = '0.11792452830189' where hostname = '10.5.110.25'"
01/09/2007 12:20:05 PM - CMDPHP: Poller[0] DEBUG: SQL Exec: "update host set status = '1', status_event_count = '1379', status_fail_date = '2007-01-04 17:38:00', status_rec_date = '2007-01-04 17:31:00', status_last_error = 'SNMP not performed due to setting or ping result., Cannot connect to host', min_time = '0.65053', max_time = '0.93150', cur_time = '0.93150', avg_time = '0.79102', total_polls = '1697', failed_polls = '1695', availability = '0.11785503830289' where hostname = '10.5.110.25'"
01/09/2007 12:25:05 PM - CMDPHP: Poller[0] DEBUG: SQL Exec: "update host set status = '1', status_event_count = '1380', status_fail_date = '2007-01-04 17:38:00', status_rec_date = '2007-01-04 17:31:00', status_last_error = 'SNMP not performed due to setting or ping result., Cannot connect to host', min_time = '0.65053', max_time = '0.93150', cur_time = '0.93150', avg_time = '0.79102', total_polls = '1698', failed_polls = '1696', availability = '0.11778563015312' where hostname = '10.5.110.25'"
The cacti and cactid versions are 0.8.6i.
Any ideas?
I have looked through the posts, and have not found someone with this problem that did not get resolved.
I have a host, that does not support SNMP , I am pulling stats off it with a Net::Telnet script. I can ping the device from my cacti server as the cacti user:
$ ping 10.5.110.25
PING 10.5.110.25 (10.5.110.25) 56(84) bytes of data.
64 bytes from 10.5.110.25: icmp_seq=0 ttl=63 time=1.99 ms
64 bytes from 10.5.110.25: icmp_seq=1 ttl=63 time=1.51 ms
64 bytes from 10.5.110.25: icmp_seq=2 ttl=63 time=1.35 ms
--- 10.5.110.25 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 1.352/1.622/1.995/0.272 ms, pipe 2
My Downed Host Detection setting is SNMP and PING, Ping Type = ICMP.
When I change Detection to PING only. The host does not come up, in fact, I lose about 40 of my other hosts, with this in the error log:
ERROR: HOST EVENT: Host is DOWN Message: UDP: Ping timed out
?If I am set for ICMP ping, why is it trying a UDP ping?
I wait for a few poll periods, then change the Ping Type to UDP. Nothing changes. When I go back to Detection = SNMP and PING, Keeping PING Type= UDP, and I end up back where I started, with all my hosts up:
01/09/2007 05:55:15 PM - CACTID: Poller[0] Host[314] NOTICE: HOST EVENT: Host Returned from DOWN State
The host I have that does not support SNMP, stays down through this whole process.
I tried changing to cmd.php for testing, but it takes too long to run my poll, and basically wrecks everything.
I see this in the debug log:
e = '2007-01-04 17:31:00', status_last_error = 'SNMP not performed due to setting or ping result., Cannot connect to host', min_time = '0.65053', max_time = '0.93150', cur_time = '0.93150', avg_time = '0.79102', total_polls = '1695', failed_polls = '1693', availability = '0.11799410029499' where hostname = '10.5.110.25'"
01/09/2007 12:15:07 PM - CMDPHP: Poller[0] DEBUG: SQL Exec: "update host set status = '1', status_event_count = '1378', status_fail_date = '2007-01-04 17:38:00', status_rec_date = '2007-01-04 17:31:00', status_last_error = 'SNMP not performed due to setting or ping result., Cannot connect to host', min_time = '0.65053', max_time = '0.93150', cur_time = '0.93150', avg_time = '0.79102', total_polls = '1696', failed_polls = '1694', availability = '0.11792452830189' where hostname = '10.5.110.25'"
01/09/2007 12:20:05 PM - CMDPHP: Poller[0] DEBUG: SQL Exec: "update host set status = '1', status_event_count = '1379', status_fail_date = '2007-01-04 17:38:00', status_rec_date = '2007-01-04 17:31:00', status_last_error = 'SNMP not performed due to setting or ping result., Cannot connect to host', min_time = '0.65053', max_time = '0.93150', cur_time = '0.93150', avg_time = '0.79102', total_polls = '1697', failed_polls = '1695', availability = '0.11785503830289' where hostname = '10.5.110.25'"
01/09/2007 12:25:05 PM - CMDPHP: Poller[0] DEBUG: SQL Exec: "update host set status = '1', status_event_count = '1380', status_fail_date = '2007-01-04 17:38:00', status_rec_date = '2007-01-04 17:31:00', status_last_error = 'SNMP not performed due to setting or ping result., Cannot connect to host', min_time = '0.65053', max_time = '0.93150', cur_time = '0.93150', avg_time = '0.79102', total_polls = '1698', failed_polls = '1696', availability = '0.11778563015312' where hostname = '10.5.110.25'"
The cacti and cactid versions are 0.8.6i.
Any ideas?
I opened up my firewall, however I cannot get a response from this host on the UDP ping:
hping2 10.5.110.25 --udp -p 33439
HPING 10.5.110.25 (eth0 10.5.110.25): udp mode set, 28 headers + 0 data bytes
--- 10.5.110.25 hping statistic ---
6 packets tramitted, 0 packets received, 100% packet loss
round-trip min/avg/max = 0.0/0.0/0.0 ms
This is device is a remote power strip. It does not have much intelligence.
I also tried to allow cacti to use the ICMP Ping and cleared the poller cache, but this did not seem to work:
chmod 4711 /usr/local/cactid/bin/cactid
chmod +s /usr/local/cactid/bin/cactid
However, when i snoop my interface, I dont see ICMP pings, still the UDP:
tcpdump | grep 10.5.110.25
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
15:30:17.916437 IP admin.x.x.51691 > 10.5.110.25.33439: UDP, length 23
15:30:18.319000 IP admin.x.x.51691 > 10.5.110.25.33439: UDP, length 23
15:30:18.322002 IP admin.x.x51691 > 10.5.110.25.33439: UDP, length 23
Is there something else I need to try?
Is there a way to ignore up/down state per host?
thanks
hping2 10.5.110.25 --udp -p 33439
HPING 10.5.110.25 (eth0 10.5.110.25): udp mode set, 28 headers + 0 data bytes
--- 10.5.110.25 hping statistic ---
6 packets tramitted, 0 packets received, 100% packet loss
round-trip min/avg/max = 0.0/0.0/0.0 ms
This is device is a remote power strip. It does not have much intelligence.
I also tried to allow cacti to use the ICMP Ping and cleared the poller cache, but this did not seem to work:
chmod 4711 /usr/local/cactid/bin/cactid
chmod +s /usr/local/cactid/bin/cactid
However, when i snoop my interface, I dont see ICMP pings, still the UDP:
tcpdump | grep 10.5.110.25
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
15:30:17.916437 IP admin.x.x.51691 > 10.5.110.25.33439: UDP, length 23
15:30:18.319000 IP admin.x.x.51691 > 10.5.110.25.33439: UDP, length 23
15:30:18.322002 IP admin.x.x51691 > 10.5.110.25.33439: UDP, length 23
Is there something else I need to try?
Is there a way to ignore up/down state per host?
thanks
- gandalf
- Developer
- Posts: 22383
- Joined: Thu Dec 02, 2004 2:46 am
- Location: Muenster, Germany
- Contact:
Ahhh. I already advised reading my NaN Debugging Howto? There's an entry on reducing MAX SNMP OID Get Size. Start with a value of 1 for slow devices.yardus9 wrote:This is device is a remote power strip. It does not have much intelligence.
And check results of Downed Host Detection by setting poller DEBUG level to DEBUG and watch the log for Host[...] entries, where the number correlates to the id of your host (find it in the url when editing the device)
Reinhard
- gandalf
- Developer
- Posts: 22383
- Joined: Thu Dec 02, 2004 2:46 am
- Location: Muenster, Germany
- Contact:
Oh, this seems to be a bug. Hmm, let me try this again. And sorry for my misunderstanding for SNMP usage, I've read too fast ...yardus9 wrote:However, when i snoop my interface, I dont see ICMP pings, still the UDP:
tcpdump | grep 10.5.110.25
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
15:30:17.916437 IP admin.x.x.51691 > 10.5.110.25.33439: UDP, length 23
15:30:18.319000 IP admin.x.x.51691 > 10.5.110.25.33439: UDP, length 23
15:30:18.322002 IP admin.x.x51691 > 10.5.110.25.33439: UDP, length 23
Reinhard
- TheWitness
- Developer
- Posts: 17007
- Joined: Tue May 14, 2002 5:08 pm
- Location: MI, USA
- Contact:
This is a change that the team voted on . If the snmp string is "blank", then we still perform a icmp/udp ping and if the host does not respond to either, we still mark it as down.
Argh, I know the solution, but have been holding off for too long. I may regret what I think I am about to do.
TheWitness
oh, until then, what you need to do is make a small change to lib/ping.php, revert to 0.8.6i. No other changes required.
Me.
Argh, I know the solution, but have been holding off for too long. I may regret what I think I am about to do.
TheWitness
oh, until then, what you need to do is make a small change to lib/ping.php, revert to 0.8.6i. No other changes required.
Me.
True understanding begins only when we realize how little we truly understand...
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
My current version is cacti-0.8.6i. Should I revert to the ping.php in cacti-0.8.6h? Should I just edit the file myself, it is a simple fix?
Going forward, does it make sense to detect the host status based on device template, or something more granular? I would imagine that would be a pretty big change.
thanks
Going forward, does it make sense to detect the host status based on device template, or something more granular? I would imagine that would be a pretty big change.
thanks
- TheWitness
- Developer
- Posts: 17007
- Joined: Tue May 14, 2002 5:08 pm
- Location: MI, USA
- Contact:
I need to get cycles to write it. After which, I will let you know..
TheWitness
TheWitness
True understanding begins only when we realize how little we truly understand...
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
- TheWitness
- Developer
- Posts: 17007
- Joined: Tue May 14, 2002 5:08 pm
- Location: MI, USA
- Contact:
What poller are you using?
TheWitness
TheWitness
True understanding begins only when we realize how little we truly understand...
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Who is online
Users browsing this forum: anwaraahmad1 and 0 guests