Spine Seg fault

Post support questions that directly relate to Linux/Unix operating systems.

Moderators: Developers, Moderators

mxtommy
Posts: 17
Joined: Fri Mar 20, 2009 10:15 am

Spine Seg fault

Post by mxtommy »

Hi,

I was wondering if someone could help me troubleshoot this issue.

I just added a new host to cacti. It's a VOIP box with non-standard snmp settings.

We had configured the device along with data sources etc before realizing that the snmp box wasn't permitting our ip (duh). However when we added the ip to the allow list, spine immediately stared segfaulting.

I ran a poller cycle in debug mode, and found this:

03/20/2009 10:55:06 AM - SPINE: Poller[0] DEBUG: Valid Thread to be Created
03/20/2009 10:55:06 AM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 30
03/20/2009 10:55:06 AM - SPINE: Poller[0] DEBUG: In Poller, About to Start Polling of Host
03/20/2009 10:55:06 AM - SPINE: Poller[0] Host[157] PING: Result ICMP: Host is Alive
03/20/2009 10:55:06 AM - SPINE: Poller[0] Host[157] RECACHE: Processing 1 items in the auto reindex cache for 'IP_ADDR_OF_VOIP_BOX'
03/20/2009 10:55:06 AM - SPINE: Poller[0] FATAL: Spine Encountered a Segmentation Fault (Spine thread)

I Disabled the new VOIP box in cacti and the problem goes away.

OS: CentOS 5.2
Cacti version: 0.8.7d
Spine version: 0.8.7c
netsnmp version: net-snmp-5.3.1-24.el5_2.2

I couldn't try with cmd.php as we have too many hosts (takes too long)

The snmp settings I mentioned are:

snmp version 1
snmp port 162
custom OID's .1.3.6.1.4.1.25060.1.1 and .1.3.6.1.4.1.25060.1.6

a manual snmpget seems to works fine:

# snmpget -v1 -c OUR_COMMUNITY OUR_IP:162 1.3.6.1.4.1.25060.1.6
SNMPv2-SMI::enterprises.25060.1.6 = INTEGER: 9

Is there any way of narrowing the problem further?

Thanks
mxtommy
Posts: 17
Joined: Fri Mar 20, 2009 10:15 am

Post by mxtommy »

Does no one have any idea how to debug this? :) Surely I can't be the only person out there to have encountered this. :o

Is there some way of configuring spine so that it will output more detail?

Thanks!
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Post by gandalf »

When running spine in verbosity=5 against that host only, what happens?
Reinhard
User avatar
TheWitness
Developer
Posts: 17059
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

I'm checking in a new SVN. I reviewed the code and see what might be a few corner cases. Please test tomorrow.

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
mxtommy
Posts: 17
Joined: Fri Mar 20, 2009 10:15 am

Post by mxtommy »

Thanks for the tip! I didn't know one could run spine manually on a single host.

With the host disabled, spine didn't segfault, though it didn't really do anything (expected behavior).

I reactivated the host, and spine segfaulted again. (This is still with 0.8.7c)

Verbosity 5 didn't seem to add any new information:

Code: Select all

DEBUG: In Poller, About to Start Polling of Host
Host[0] DEBUG: HOST COMPLETE: About to Exit Host Polling Thread Function
DEBUG: The Value of Active Threads is 1
Host[157] PING: Result ICMP: Host is Alive
Host[157] RECACHE: Processing 1 items in the auto reindex cache for 'OURIP'
FATAL: Spine Encountered a Segmentation Fault (Spine thread)

I ran an strace on the spine process and got the following:

Code: Select all

[pid 11697] write(6, "R\1\0\0\3UPDATE host SET status=\'3\',"..., 342) = 342
[pid 11697] read(6, "0\0\0\1\0\1\0\2\0\1\0(Rows matched: 1  Cha"..., 16384) = 52
[pid 11697] poll([{fd=6, events=POLLIN|POLLPRI}], 1, 0) = 0
[pid 11697] write(6, "[\0\0\0\3SELECT data_query_id, actio"..., 95) = 95
[pid 11697] read(6, "\1\0\0\1\5U\0\0\2\3def\tcacti_new\16poller_r"..., 16384) = 443
[pid 11697] sendmsg(7, {msg_name(16)={sa_family=AF_INET, sin_port=htons(162), sin_addr=inet_addr("OURIP")}, msg_iov(1)=[{"0,\2\1\0\4\tOURCOMMUNITY\240\34\2\4sviP\2\1\0\2\1\0000\16"..., 46}], msg_controllen=24, {cmsg_len=24, cmsg_level=SOL_IP, cmsg_type=, ...}, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 46
[pid 11697] gettimeofday({1238069175, 564821}, NULL) = 0
[pid 11697] gettimeofday({1238069175, 564905}, NULL) = 0
[pid 11697] select(8, [7], NULL, NULL, {0, 499916}) = 1 (in [7], left {0, 499916})
[pid 11697] recvmsg(7, {msg_name(16)={sa_family=AF_INET, sin_port=htons(162), sin_addr=inet_addr("OURIP")}, msg_iov(1)=[{"0\36\2\1\0\4\tOURCOMMUNITY\242\16\2\4sviP\2\1\0\2\1\0000\0"..., 65536}], msg_controllen=0, msg_flags=0}, 0) = 32
[pid 11697] --- SIGSEGV (Segmentation fault) @ 0 (0) ---
[pid 11697] rt_sigaction(SIGSEGV, {SIG_DFL}, {0x8054a80, [SEGV], SA_RESTART}, 8) = 0
[pid 11697] write(1, "FATAL: Spine Encountered a Segme"..., 61FATAL: Spine Encountered a Segmentation Fault (Spine thread)
) = 61
So it seems to be segfaulting right after recvmsg

I checked out the latest SVN and it's segfaulting at the exact same spot.

Also, admitiedly this is the first time I've used SVN, but i had some problems compiling spine.

I had to overite the following files with versions from spine-0.8.7c, as they were generating errors. in all 3 cases the files were calling "Link" but with only one argument. Link takes two arguments. Am I doing something wrong? :)

config/config.sub
-------
configure: error: cannot run /bin/sh config/config.sub
#/bin/sh config/config.sub
link: missing operand after `/usr/share/libtool/config.sub'


config/config.guess
-------
#./configure
checking build system type... link: missing operand after `/usr/share/libtool/config.guess'

(configure finished sucessfully after replacing these two files)

libtool
-------
#make
........
/bin/sh ./libtool --tag=CC --mode=link gcc -I/usr/include/net-snmp -I/usr/include/net-snmp/.. -I/usr/include/mysql -g -O2 -L/usr/lib -L/usr/lib/mysql -o spine sql.o spine.o util.o snmp.o locks.o poller.o nft_popen.o php.o ping.o keywords.o error.o -lnetsnmp -lmysqlclient_r -lmysqlclient_r -lcrypto -lz -lpthread -lm
link: missing operand after `/usr/share/libtool/ltmain.sh'


After replacing these three files with versions from 0.8.7c spine compiled fine.

Thanks for looking at this, I REALLY appreciate it!
User avatar
TheWitness
Developer
Posts: 17059
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

Can you deploy 0.8.7d-pre2 and test again. It can be found in the announcement forum. Then dual post for those in both threads.

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
mxtommy
Posts: 17
Joined: Fri Mar 20, 2009 10:15 am

Post by mxtommy »

By dual post I assume you mean I should update both threads, if not then sorry in advance :)

There's no change with pre2. It still segfaults at the same spot.

Thanks, Thomas
User avatar
TheWitness
Developer
Posts: 17059
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

mxtommy wrote:By dual post I assume you mean I should update both threads, if not then sorry in advance :)

There's no change with pre2. It still segfaults at the same spot.

Thanks, Thomas
Can you do a gotomeeting in the am? That is around 7:00 EST GMT-5?

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
mxtommy
Posts: 17
Joined: Fri Mar 20, 2009 10:15 am

Post by mxtommy »

Sorry I didn't see your post in time. I could do one on monday though if you want. Let me know!

Thanks!
User avatar
TheWitness
Developer
Posts: 17059
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

Monday wont work. How about Sunday?
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
mxtommy
Posts: 17
Joined: Fri Mar 20, 2009 10:15 am

Post by mxtommy »

Sure Sunday would work for me. Were you thinking the same time? (7am) at any rate I'm free all day so whatever works best for you :)

Perhaps we should take this to PM's?

Thanks!
User avatar
TheWitness
Developer
Posts: 17059
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

Agreed, watch your email. Its dinner time so you won't see anything till sometime Saturday afternoon.

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
User avatar
TheWitness
Developer
Posts: 17059
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

Ok, mxtommy and I resolved his problem. It was interesting. An snmp enabled host, that responsed to a sysUptime request, with no snmp errors, but a null response (blank string).

So, I have revised snmp.c to handle this corner case. I will post here and this will be either in the release code or the pre3, whichever I decide to do.

NOTE: I STILL NEED AN SNMPV3 TESTER! One who can do an online session as was done with mxtommy today.

Thanks Thomas!!

TheWitness
Attachments
snmp.c
(20.38 KiB) Downloaded 438 times
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
mxtommy
Posts: 17
Joined: Fri Mar 20, 2009 10:15 am

Post by mxtommy »

Hi!

I'd like to publicly thank TheWitness for his help this morning :D I learned a lot about debugging spine in the process to.

So thanks to TheWitness!
soloslinger
Posts: 32
Joined: Fri Jan 19, 2007 2:11 pm

Post by soloslinger »

I am having a similar problem, if folks are still reading this thread I won't hafta create a new one.

I am running spine and having segfaults as well, though my cacti log doesn't show it in the way the original poster does. Some of my graphs are updated every minute, others are updated every five.
The seg faults occur every 5 minutes and it doesn't update my graphs while this happens.

Code: Select all

This GDB was configured as "i386-marcel-freebsd".
Core was generated by `spine'.
Program terminated with signal 11, Segmentation fault.
#0  0x282c7537 in ?? ()

Code: Select all

03/31/2009 05:49:59 AM - SYSTEM STATS: Time:118.2700 Method:spine Processes:2 Threads:1 Hosts:36 HostsPerProcess:18 DataSources:342 RRDsProcessed:156
03/31/2009 05:49:59 AM - SYSTEM STATS: Time:0.3631 Method:spine Processes:2 Threads:1 Hosts:36 HostsPerProcess:18 DataSources:342 RRDsProcessed:49
03/31/2009 05:50:03 AM - SYSTEM STATS: Time:2.6043 Method:spine Processes:2 Threads:1 Hosts:36 HostsPerProcess:18 DataSources:420 RRDsProcessed:208
03/31/2009 05:51:04 AM - SYSTEM STATS: Time:3.5402 Method:spine Processes:2 Threads:1 Hosts:36 HostsPerProcess:18 DataSources:420 RRDsProcessed:170
03/31/2009 05:52:05 AM - SYSTEM STATS: Time:4.5735 Method:spine Processes:2 Threads:1 Hosts:36 HostsPerProcess:18 DataSources:420 RRDsProcessed:175
03/31/2009 05:54:59 AM - SYSTEM STATS: Time:118.2402 Method:spine Processes:2 Threads:1 Hosts:36 HostsPerProcess:18 DataSources:420 RRDsProcessed:156
03/31/2009 05:54:59 AM - SYSTEM STATS: Time:0.3494 Method:spine Processes:2 Threads:1 Hosts:36 HostsPerProcess:18 DataSources:420 RRDsProcessed:49
03/31/2009 05:55:03 AM - SYSTEM STATS: Time:2.6377 Method:spine Processes:2 Threads:1 Hosts:36 HostsPerProcess:18 DataSources:338 RRDsProcessed:209
And when this happens, the cacti log just repeats every second:

Code: Select all

03/31/2009 01:59:38 AM - CMDPHP: Poller[0] DEBUG: SQL Assoc: "select poller_id,end_time from poller_time where poller_id=0"
03/31/2009 01:59:38 AM - CMDPHP: Poller[0] DEBUG: SQL Assoc: "select  poller_output.output,  poller_output.time,  poller_output.local_data_id,  poller_item.rrd_path,  poller_item.rrd_name,  poller_item.rrd_num  from (poller_output,poller_item)  where (poller_output.local_data_id=poller_item.local_data_id and poller_output.rrd_name=poller_item.rrd_name)  LIMIT 10000"
This is the only warning in the cacti log relating to this.

Code: Select all

03/31/2009 05:56:00 AM - POLLER: Poller[0] WARNING: Poller Output Table not Empty. Potential Data Source Issues for Data Sources: traffic_in(DS[746]), traffic_out(DS[746]), traffic_in(DS[747]), traffic_out(DS[747]), traffic_in(DS[777]), traffic_out(DS[777]), traffic_in(DS[778]), traffic_out(DS[778]), temp(DS[796]), temp(DS[801]), traffic_in(DS[816]), traffic_out(DS[816]), traffic_in(DS[843]), traffic_out(DS[843]), traffic_in(DS[853]), traffic_out(DS[853]), traffic_in(DS[866]), traffic_out(DS[866]), traffic_in(DS[871]), traffic_out(DS[871]), traffic_in(DS[875]), traffic_out(DS[875]), traffic_in(DS[876]), traffic_out(DS[876]), traffic_in(DS[877]), traffic_out(DS[877]) 
Is this a related issue? Any ideas??

soloslinger
plugin: 2.1
cacti: 0.8.7b
apache 2.0
php: 5.2.5
mysql: 5.1
os: freebsd 6.2
Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest