I did manage to find an old thread from 2008 that was almost the problem but not quite.
I have an Ubuntu 10 server with cacti 0.8.7g along with Spine 0.8.7g installed. Things were going great until I realized that hosts are falsely being reported as "Down" when they aren't. That got me to also notice that those graphs aren't getting updated for those hosts that are "Down".
I did the obvious thing and checked the host's SNMP creds and pingability and it returns results/pings just fine. I then checked the cacti log and saw that it shows that those hosts are down and then never ever polls them again.
As soon as I change the poller to cmd.php magically they all switch to "Recovering". I updated to 0.8.7g from the default 0.8.7e that Ubuntu comes with because I thought maybe this was some sort of known problem and was fixed but I am now at a loss as to how to fix it. I have rebuilt the poller cache countless times. Like I said, things work great when set to cmd.php for the poller but as soon as I change it to spine the hosts that showed "Down" before go right back. The strange thing to is that it isn't all of my hosts, just like 5 of the 60 or so.
Any help would be greatly appreciated.
Spine/Poller Issue
Moderators: Developers, Moderators
-
- Posts: 21
- Joined: Wed Aug 04, 2004 4:28 pm
- Location: Thornton, CO
Spine/Poller Issue
Thanks,
Bryan
Bryan
What downtime detection method is cacti using for each problem device?
With the cacti logging level set to medium or higher, look why it thinks each device is down.
Down devices will not get polled.
With the cacti logging level set to medium or higher, look why it thinks each device is down.
Down devices will not get polled.
| Scripts: Monitor processes | RFC1213 MIB | DOCSIS Stats | Dell PowerEdge | Speedfan | APC UPS | DOCSIS CMTS | 3ware | Motorola Canopy |
| Guides: Windows Install | [HOWTO] Debug Windows NTFS permission problems |
| Tools: Windows All-in-one Installer |
Re: Spine/Poller Issue
I find that when using SNMP for the down time detection, spine always get "Host did not respond to SNMP" eventhough during the creation time of the host it can detect all the network interfaces.
I think I have narrow down the problem to the code in the ping_snmp function of ping.c of the spine package.
----
--------------
To my tired eyes the code where it checks the num_oids_check seems to be wrong.
At the begining of the function it (num_oids_checked) is initialized to zero and it will not pass the if section and will never get incremented. So I have changed it to " < 2 " and it seems to be working for me now.
If I m wrong about this let me know.
I think I have narrow down the problem to the code in the ping_snmp function of ping.c of the spine package.
----
Code: Select all
int ping_snmp(host_t *host, ping_t *ping) {
char *poll_result;
char *oid;
int num_oids_checked = 0;/*<<<========*/
double begin_time, end_time, total_time;
double one_thousand = 1000.00;
if (host->snmp_session) {
if ((strlen(host->snmp_community) != 0) || (host->snmp_version == 3)) {
/* by default, we look at sysUptime */
if ((oid = strdup(".1.3")) == NULL) {
die("ERROR: malloc(): strdup() oid ping.c failed");
}
/* record start time */
retry:
begin_time = get_time_as_double();
poll_result = snmp_getnext(host, oid);
/* record end time */
end_time = get_time_as_double();
free(oid);
total_time = (end_time - begin_time) * one_thousand;
if ((strlen(poll_result) == 0) || IS_UNDEFINED(poll_result)) {
/* if (num_oids_checked > 1 ) {*/
if (num_oids_checked < 2 ) { /*<<<=============*/
if (num_oids_checked == 0) {
/* use sysUptime as a backup if the generi
c OID fails */
if ((oid = strdup(".1.3.6.1.2.1.1.3.0")) =
= NULL) {
die("ERROR: malloc(): strdup() oid
ping.c failed");
}
}else{
/* use sysDescription as a backup if sysUp
time fails */
if ((oid = strdup(".1.3.6.1.2.1.1.1.0")) =
= NULL) {
die("ERROR: malloc(): strdup() oid
ping.c failed");
}
}
free(poll_result);
num_oids_checked++; /* <================ */
goto retry;
}else{
snprintf(ping->snmp_response, SMALL_BUFSIZE, "Host
did not respond to SNMP");
free(poll_result);
return HOST_DOWN;
}
}else{
To my tired eyes the code where it checks the num_oids_check seems to be wrong.
At the begining of the function it (num_oids_checked) is initialized to zero and it will not pass the if section and will never get incremented. So I have changed it to " < 2 " and it seems to be working for me now.
If I m wrong about this let me know.
Re: Spine/Poller Issue
You using spine 0.8.7g plus its latest patches?
| Scripts: Monitor processes | RFC1213 MIB | DOCSIS Stats | Dell PowerEdge | Speedfan | APC UPS | DOCSIS CMTS | 3ware | Motorola Canopy |
| Guides: Windows Install | [HOWTO] Debug Windows NTFS permission problems |
| Tools: Windows All-in-one Installer |
Re: Spine/Poller Issue
I m using 0.8.7g plus unified_issues.patch
I scan thru the svn.cacti.net and it seems the code that introduce the if statement > 1 was introduced in Revision 3844 Dec 31 2006. Is nobody using the check alive/availability by SNMP with the spine?
I scan thru the svn.cacti.net and it seems the code that introduce the if statement > 1 was introduced in Revision 3844 Dec 31 2006. Is nobody using the check alive/availability by SNMP with the spine?
Who is online
Users browsing this forum: No registered users and 3 guests