My Spine patches...

If you figure out how to do something interesting/cool in Cacti and want to share it with the community, please post your experience here.

Moderators: Developers, Moderators

Post Reply
User avatar
Alice
Cacti User
Posts: 111
Joined: Tue Oct 28, 2003 4:54 pm
Location: Bucharest, RO.

My Spine patches...

Post by Alice »

Patch #1

Sometimes, spine caused a segmentation fault. Tested with 0.8.7g and the SVN version.
With GDB, i got to the following line:

poller.c, function poll_device:

Code: Select all

--- /z/spine/branches/main/poller.c     2011-03-24 11:53:58.000000000 +0200
+++ poller.c    2011-03-25 01:09:41.000000000 +0200
@@ -1052,7 +1085,7 @@
                                        }
                                }

-                               free(poll_result);
+//                             free(poll_result);

                                SPINE_LOG_MEDIUM(("Device[%i] TH[%i] DS[%i] SS[%i] SERVER: %s, output: %s", device_id, device_thread, poller_items[i].local_data_id, php_process, poller_items[i].arg1, poller_items[i].result));
segmentation faults are gone, memory usage does not seem to increase durring polling.

Patch #2

Sometimes, some devices doesn not answer to the SNMP queries right away.
I've made the following dirty fix. It works, 90% of the time polling is successfull.
Of course, it can be made a lot cleaner, but I don't really have the time for it:

Code: Select all

--- /z/spine/branches/main/poller.c     2011-03-24 11:53:58.000000000 +0200
+++ poller.c    2011-03-25 01:09:41.000000000 +0200
@@ -888,7 +888,23 @@

                                        if (num_oids > 0) {
                                                snmp_get_multi(device, snmp_oids, num_oids);
-
+                                               if (device->ignore_device)
+//Try again, after a delay of 0.05sec
+                                            {
+                                               device->ignore_device=FALSE;usleep(50000);
+                                                snmp_get_multi(device, snmp_oids, num_oids);
+                                               SPINE_LOG(("Device[%i] TH[%i] DS[%i] WARNING: SNMP timeout detected [%i ms], retrying device '%s' [#1]", device_id, device_thread, poller_items[snmp_oids[j].array_position].local_data_id, device->snmp_timeout, device->hostname));
+                                            }
+
+
+                                               if (device->ignore_device)
+//Try yet again, this time after a delay of 0.5 sec.
+                                            {
+                                               device->ignore_device=FALSE;usleep(500000);
+                                                snmp_get_multi(device, snmp_oids, num_oids);
+                                               SPINE_LOG(("Device[%i] TH[%i] DS[%i] WARNING: SNMP timeout detected [%i ms], retrying device '%s' [#2]", device_id, device_thread, poller_items[snmp_oids[j].array_position].local_data_id, device->snmp_timeout, device->hostname));
+                                            }
+
                                                for (j = 0; j < num_oids; j++) {
                                                        if (device->ignore_device) {
                                                                SPINE_LOG(("Device[%i] TH[%i] DS[%i] WARNING: SNMP timeout detected [%i ms], ignoring device '%s'", device_id, device_thread, poller_items[snmp_oids[j].array_position].local_data_id, device->snmp_timeout, device->hostname));
@@ -947,6 +963,23 @@

                                if (num_oids >= device->max_oids) {
                                        snmp_get_multi(device, snmp_oids, num_oids);
+                                               if (device->ignore_device)
+//Try again, after a delay of 0.05sec
+                                            {
+                                               device->ignore_device=FALSE;usleep(50000);
+                                                snmp_get_multi(device, snmp_oids, num_oids);
+                                               SPINE_LOG(("Device[%i] TH[%i] DS[%i] WARNING: SNMP timeout detected [%i ms], retrying device '%s' [#3]", device_id, device_thread, poller_items[snmp_oids[j].array_position].local_data_id, device->snmp_timeout, device->hostname));
+                                            }
+
+
+                                               if (device->ignore_device)
+//Try yet again, this time after a delay of 0.5 sec.
+                                            {
+                                               device->ignore_device=FALSE;usleep(500000);
+                                                snmp_get_multi(device, snmp_oids, num_oids);
+                                               SPINE_LOG(("Device[%i] TH[%i] DS[%i] WARNING: SNMP timeout detected [%i ms], retrying device '%s' [#4]", device_id, device_thread, poller_items[snmp_oids[j].array_position].local_data_id, device->snmp_timeout, device->hostname));
+
+                                            }

                                        for (j = 0; j < num_oids; j++) {
                                                if (device->ignore_device) {
@@ -1077,6 +1110,22 @@
                /* process last multi-get request if applicable */
                if (num_oids > 0) {
                        snmp_get_multi(device, snmp_oids, num_oids);
+                                               if (device->ignore_device)
+//Try again, after a delay of 0.05sec
+                                            {
+                                               device->ignore_device=FALSE;usleep(50000);
+                                                snmp_get_multi(device, snmp_oids, num_oids);
+                                               SPINE_LOG(("Device[%i] TH[%i] DS[%i] WARNING: SNMP timeout detected [%i ms], retrying device '%s' [#5]", device_id, device_thread, poller_items[snmp_oids[j].array_position].local_data_id, device->snmp_timeout, device->hostname));
+                                            }
+
+
+                                               if (device->ignore_device)
+//Try yet again, this time after a delay of 0.5 sec.
+                                            {
+                                               device->ignore_device=FALSE;usleep(500000);
+                                                snmp_get_multi(device, snmp_oids, num_oids);
+                                               SPINE_LOG(("Device[%i] TH[%i] DS[%i] WARNING: SNMP timeout detected [%i ms], retrying device '%s' [#6]", device_id, device_thread, poller_items[snmp_oids[j].array_position].local_data_id, device->snmp_timeout, device->hostname));
+                                            }

                        for (j = 0; j < num_oids; j++) {
                                if (device->ignore_device) {
[url=http://www.x-graphs.com/]http://www.x-graphs.com[/url] [color=red]X[/color]-[color=blue]graphs[/color] :: All kind of graphs
User avatar
rony
Developer/Forum Admin
Posts: 6022
Joined: Mon Nov 17, 2003 6:35 pm
Location: Michigan, USA
Contact:

Re: My Spine patches...

Post by rony »

First patch is just fixing the symptom....

The segmentation fault is probably caused because a resource with in poll_results is still being referenced (pointer) by another struc.

I'm pretty sure the SNMP timeout global and device specific timeouts should take care of this. Have you tried adjusting those in the Cacti interface to see if that helps?
[size=117][i][b]Tony Roman[/b][/i][/size]
[size=84][i]Experience is what causes a person to make new mistakes instead of old ones.[/i][/size]
[size=84][i]There are only 3 way to complete a project: Good, Fast or Cheap, pick two.[/i][/size]
[size=84][i]With age comes wisdom, what you choose to do with it determines whether or not you are wise.[/i][/size]
User avatar
Alice
Cacti User
Posts: 111
Joined: Tue Oct 28, 2003 4:54 pm
Location: Bucharest, RO.

Re: My Spine patches...

Post by Alice »

1st patch: possible, but now it works OK. Without that, i was getting a lot of poller timeouts.

2nd patch: yes, of course I've tried. My SNMP timeout is set to 500ms and I've tried increasing it up to 3 seconds. I still got about the same amount of 'random' timeouts.
[url=http://www.x-graphs.com/]http://www.x-graphs.com[/url] [color=red]X[/color]-[color=blue]graphs[/color] :: All kind of graphs
Post Reply

Who is online

Users browsing this forum: No registered users and 5 guests