Spine/Poller Issue

Post general support questions here that do not specifically fall into the Linux or Windows categories.

Moderators: Developers, Moderators

Post Reply
the_sphynx
Posts: 21
Joined: Wed Aug 04, 2004 4:28 pm
Location: Thornton, CO

Spine/Poller Issue

Post by the_sphynx »

I did manage to find an old thread from 2008 that was almost the problem but not quite.

I have an Ubuntu 10 server with cacti 0.8.7g along with Spine 0.8.7g installed. Things were going great until I realized that hosts are falsely being reported as "Down" when they aren't. That got me to also notice that those graphs aren't getting updated for those hosts that are "Down".
I did the obvious thing and checked the host's SNMP creds and pingability and it returns results/pings just fine. I then checked the cacti log and saw that it shows that those hosts are down and then never ever polls them again.
As soon as I change the poller to cmd.php magically they all switch to "Recovering". I updated to 0.8.7g from the default 0.8.7e that Ubuntu comes with because I thought maybe this was some sort of known problem and was fixed but I am now at a loss as to how to fix it. I have rebuilt the poller cache countless times. Like I said, things work great when set to cmd.php for the poller but as soon as I change it to spine the hosts that showed "Down" before go right back. The strange thing to is that it isn't all of my hosts, just like 5 of the 60 or so.
Any help would be greatly appreciated.
Thanks,

Bryan
User avatar
BSOD2600
Cacti Moderator
Posts: 12171
Joined: Sat May 08, 2004 12:44 pm
Location: USA

Post by BSOD2600 »

What downtime detection method is cacti using for each problem device?

With the cacti logging level set to medium or higher, look why it thinks each device is down.

Down devices will not get polled.
wkchu
Posts: 2
Joined: Mon Aug 02, 2010 11:33 pm
Location: Kelana Jaya, Malaysia

Re: Spine/Poller Issue

Post by wkchu »

I find that when using SNMP for the down time detection, spine always get "Host did not respond to SNMP" eventhough during the creation time of the host it can detect all the network interfaces.

I think I have narrow down the problem to the code in the ping_snmp function of ping.c of the spine package.
----

Code: Select all

int ping_snmp(host_t *host, ping_t *ping) {
        char *poll_result;
        char *oid;
        int num_oids_checked = 0;/*<<<========*/
        double begin_time, end_time, total_time;
        double one_thousand = 1000.00;

        if (host->snmp_session) {
                if ((strlen(host->snmp_community) != 0) || (host->snmp_version == 3)) {
                        /* by default, we look at sysUptime */
                        if ((oid = strdup(".1.3")) == NULL) {
                                die("ERROR: malloc(): strdup() oid ping.c failed");
                        }

                        /* record start time */
                        retry:
                        begin_time = get_time_as_double();

                        poll_result = snmp_getnext(host, oid);

                        /* record end time */
                        end_time = get_time_as_double();

                        free(oid);

                        total_time = (end_time - begin_time) * one_thousand;

                        if ((strlen(poll_result) == 0) || IS_UNDEFINED(poll_result)) {
/*                                if (num_oids_checked > 1 ) {*/
                                if (num_oids_checked < 2 ) {     /*<<<=============*/
                                        if (num_oids_checked == 0) {
                                                /* use sysUptime as a backup if the generi
c OID fails */
                                               if ((oid = strdup(".1.3.6.1.2.1.1.3.0")) =
= NULL) {
                                                        die("ERROR: malloc(): strdup() oid
 ping.c failed");
                                                }
                                        }else{
                                                /* use sysDescription as a backup if sysUp
time fails */
                                                if ((oid = strdup(".1.3.6.1.2.1.1.1.0")) =
= NULL) {
                                                        die("ERROR: malloc(): strdup() oid
 ping.c failed");
                                                }
                                        }

                                        free(poll_result);
                                        num_oids_checked++; /* <================ */
                                        goto retry;
                                }else{
                                        snprintf(ping->snmp_response, SMALL_BUFSIZE, "Host
 did not respond to SNMP");
                                        free(poll_result);
                                        return HOST_DOWN;
                                }
                        }else{
--------------
To my tired eyes the code where it checks the num_oids_check seems to be wrong.
At the begining of the function it (num_oids_checked) is initialized to zero and it will not pass the if section and will never get incremented. So I have changed it to " < 2 " and it seems to be working for me now.

If I m wrong about this let me know.
User avatar
BSOD2600
Cacti Moderator
Posts: 12171
Joined: Sat May 08, 2004 12:44 pm
Location: USA

Re: Spine/Poller Issue

Post by BSOD2600 »

You using spine 0.8.7g plus its latest patches?
wkchu
Posts: 2
Joined: Mon Aug 02, 2010 11:33 pm
Location: Kelana Jaya, Malaysia

Re: Spine/Poller Issue

Post by wkchu »

I m using 0.8.7g plus unified_issues.patch

I scan thru the svn.cacti.net and it seems the code that introduce the if statement > 1 was introduced in Revision 3844 Dec 31 2006. Is nobody using the check alive/availability by SNMP with the spine?
Post Reply

Who is online

Users browsing this forum: No registered users and 4 guests