[SOLVED] graphs have stopped graphing, spine segfaults

sheriffjms · Post by **sheriffjms** » Fri May 25, 2012 4:48 pm

This is on a Cacti 0.8.8 machine running RHEL6 as a VM.

[root@netinfo01 log]# uname -a
Linux netinfo01.ns.XXX.YYY 2.6.32-220.4.1.el6.x86_64 #1 SMP Thu Jan 19 14:50:54 EST 2012 x86_64 x86_64 x86_64 GNU/Linux

All graphs have stopped graphing, but it looks like data is getting recorded to the RRDs, at least at first glance.
One interesting thing is that some graphs stopped between 23:00 on Wednesday and 00:00 on Thursday, while others stopped at around 05:00 on Wednesday. Both of these would have been at times when I was not actively working on the machine.

Example:

: Interface graph; graph_image_01.png (38.11 KiB) Viewed 4961 times

I've noticed the following items in the logs:

Code: Select all

05/25/2012 05:15:01 PM - SPINE: Poller[0] FATAL: Spine Encountered a Segmentation Fault (Spine thread)
05/25/2012 05:15:01 PM - SPINE: Poller[0] ERROR: The System Lacked the Resources to Create a Thread
05/25/2012 05:15:01 PM - SPINE: Poller[0] ERROR: The System Lacked the Resources to Create a Thread
05/25/2012 05:15:01 PM - SPINE: Poller[0] ERROR: The System Lacked the Resources to Create a Thread

I looked at the code for spine.c, and it looks like this error is returned if it gets an EAGAIN from mysql, so the problem could be something with mysql, but I don't know what the problem would be. I did notice earlier that the mysqld process was chewing up a fair amount of CPU and memory, so I shut mysqld down and re-started it, with no significant change in behavior - mysqld would quickly chew up CPU and RAM again. For a brief period after I restarted mysqld, I saw that some RRD files were getting updated. It doesn't look like the machine is starving for CPU or memory/vmem:

Code: Select all

top - 17:32:04 up 115 days,  2:25,  2 users,  load average: 0.00, 0.00, 0.00
Tasks: 1107 total,   3 running, 1104 sleeping,   0 stopped,   0 zombie
Cpu(s): 31.0%us,  5.7%sy,  0.0%ni, 63.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   4057416k total,  3929348k used,   128068k free,    20292k buffers
Swap:  8388600k total,  1394616k used,  6993984k free,   158168k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND           
23635 mysql     20   0  695m  24m 4592 S 23.7  0.6  22:06.31 mysqld             
31438 cactiuse  20   0  162m  16m 6176 R 11.2  0.4   0:13.94 php                
31558 root      20   0 15888 2068  972 R  0.3  0.1   0:00.12 top                
    1 root      20   0 19272  744  560 S  0.0  0.0   0:04.93 init               
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kthreadd           
    3 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/0        
    4 root      20   0     0    0    0 R  0.0  0.0   0:00.01 ksoftirqd/0        
    5 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/0        
    6 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 watchdog/0         
    7 root      20   0     0    0    0 S  0.0  0.0   0:00.40 events/0           
    8 root      20   0     0    0    0 S  0.0  0.0   0:00.00 cpuset             
    9 root      20   0     0    0    0 S  0.0  0.0   0:00.11 khelper            
   10 root      20   0     0    0    0 S  0.0  0.0   0:00.00 netns              
   11 root      20   0     0    0    0 S  0.0  0.0   0:00.00 async/mgr          
   12 root      20   0     0    0    0 S  0.0  0.0   0:00.00 pm                 
   13 root      20   0     0    0    0 S  0.0  0.0   0:00.00 sync_supers        
   14 root      20   0     0    0    0 S  0.0  0.0   0:00.00 bdi-default

I have the poller log dumped out to a file for each poller run, and that also shows some interesting information. Something appears to be 1. causing poller.php to be very unhappy and 2. causing the poller run to try to exceed 300 seconds (the poller run normally completes in just a few seconds). poller.log shows things like this, in addition to thousands upon thousands of "Waiting on 1 of 1 pollers." messages, and the occasional "resource temporarily unavailable" messages from rrd.php, assumingly when it tries to write to an RRD file.

An strace of the running poller.php process shows it waiting for something, until it gets killed by the next 5-minute poller run.

I activated the 'domains' and 'spikekill' plugins during the day on Wednesday, but I've deactivated both of them since, to eliminate them as variables while I work on this larger problem.

poller.php is running as 'cactiuser', and cactiuser owns all of the files in the cacti directory structure (/var/www/html/stats).

So... at this point, I'm just trying to get a handle on what's happening, and what I can do to fix it / keep it from happening again.

sheriffjms · Post by **sheriffjms** » Fri May 25, 2012 4:55 pm

Below is the Tech Support output from Cacti:

Code: Select all

Technical Support
General Information
Date 	Fri, 25 May 2012 17:49:22 -0400
Cacti Version 	0.8.8
Cacti OS 	unix
SNMP Version 	NET-SNMP version: 5.5
RRDTool Version 	RRDTool 1.3.x
Hosts 	64
Graphs 	193
Data Sources 	Script/Command: 28
SNMP Query: 163
Total: 191

Poller Information
Interval 	300
Type 	SPINE 0.8.8 Copyright 2002-2012 by The Cacti Group
Items 	Action[0]: 326
Action[1]: 23
Total: 349
Concurrent Processes 	1
Max Threads 	10
PHP Servers 	5
Script Timeout 	25
Max OID 	20
Last Run Statistics 	Time:298.8992 Method:spine Processes:1 Threads:10 Hosts:64 HostsPerProcess:64 DataSources:349 RRDsProcessed:74

PHP Information
PHP Version 	5.3.3
PHP OS 	Linux
PHP uname 	Linux netinfo01.ns.XXX.YYY 2.6.32-220.4.1.el6.x86_64 #1 SMP Thu Jan 19 14:50:54 EST 2012 x86_64
PHP SNMP 	Installed
max_execution_time 	30
memory_limit 	128M

MySQL Table Information
Name 	Rows 	Engine 	Collation
cdef 	8 	MyISAM 	latin1_swedish_ci
cdef_items 	22 	MyISAM 	latin1_swedish_ci
colors 	101 	MyISAM 	latin1_swedish_ci
data_input 	17 	MyISAM 	latin1_swedish_ci
data_input_data 	2357 	MyISAM 	latin1_swedish_ci
data_input_fields 	67 	MyISAM 	latin1_swedish_ci
data_local 	191 	MyISAM 	latin1_swedish_ci
data_template 	46 	MyISAM 	latin1_swedish_ci
data_template_data 	237 	MyISAM 	latin1_swedish_ci
data_template_data_rra 	1004 	MyISAM 	latin1_swedish_ci
data_template_rrd 	440 	MyISAM 	latin1_swedish_ci
graph_local 	193 	MyISAM 	latin1_swedish_ci
graph_template_input 	97 	MyISAM 	latin1_swedish_ci
graph_template_input_defs 	299 	MyISAM 	latin1_swedish_ci
graph_templates 	38 	MyISAM 	latin1_swedish_ci
graph_templates_gprint 	4 	MyISAM 	latin1_swedish_ci
graph_templates_graph 	231 	MyISAM 	latin1_swedish_ci
graph_templates_item 	1804 	MyISAM 	latin1_swedish_ci
graph_tree 	8 	MyISAM 	latin1_swedish_ci
graph_tree_items 	456 	MyISAM 	latin1_swedish_ci
host 	64 	MyISAM 	latin1_swedish_ci
host_graph 	227 	MyISAM 	latin1_swedish_ci
host_snmp_cache 	90880 	MyISAM 	latin1_swedish_ci
host_snmp_query 	46 	MyISAM 	latin1_swedish_ci
host_template 	18 	MyISAM 	latin1_swedish_ci
host_template_graph 	53 	MyISAM 	latin1_swedish_ci
host_template_snmp_query 	21 	MyISAM 	latin1_swedish_ci
plugin_config 	2 	MyISAM 	latin1_swedish_ci
plugin_db_changes 	0 	MyISAM 	latin1_swedish_ci
plugin_domains 	1 	MyISAM 	latin1_swedish_ci
plugin_domains_ldap 	1 	MyISAM 	latin1_swedish_ci
plugin_hooks 	15 	MyISAM 	latin1_swedish_ci
plugin_realms 	2 	MyISAM 	latin1_swedish_ci
plugin_spikekill_templates 	0 	MyISAM 	latin1_swedish_ci
poller 	0 	MyISAM 	latin1_swedish_ci
poller_command 	0 	MyISAM 	latin1_swedish_ci
poller_item 	349 	MyISAM 	latin1_swedish_ci
poller_output 	0 	MyISAM 	latin1_swedish_ci
poller_reindex 	44 	MyISAM 	latin1_swedish_ci
poller_time 	1 	MyISAM 	latin1_swedish_ci
rra 	5 	MyISAM 	latin1_swedish_ci
rra_cf 	10 	MyISAM 	latin1_swedish_ci
settings 	91 	MyISAM 	latin1_swedish_ci
settings_graphs 	0 	MyISAM 	latin1_swedish_ci
settings_tree 	0 	MyISAM 	latin1_swedish_ci
snmp_query 	13 	MyISAM 	latin1_swedish_ci
snmp_query_graph 	23 	MyISAM 	latin1_swedish_ci
snmp_query_graph_rrd 	41 	MyISAM 	latin1_swedish_ci
snmp_query_graph_rrd_sv 	59 	MyISAM 	latin1_swedish_ci
snmp_query_graph_sv 	40 	MyISAM 	latin1_swedish_ci
user_auth 	2 	MyISAM 	latin1_swedish_ci
user_auth_perms 	0 	MyISAM 	latin1_swedish_ci
user_auth_realm 	19 	MyISAM 	latin1_swedish_ci
user_log 	77 	MyISAM 	latin1_swedish_ci
version 	1 	MyISAM 	latin1_swedish_ci

PHP Module Information
apache2handler

Apache Version 	Apache/2.2.15 (Red Hat)
Apache API Version 	20051115
Server Administrator 	root@localhost
Hostname:Port 	netinfo01.ns.XXX.YYY:0
User/Group 	cactiuser(15391)/15391
Max Requests 	Per Child: 4000 - Keep Alive: off - Max Per Connection: 100
Timeouts 	Connection: 60 - Keep-Alive: 15
Virtual Server 	No
Server Root 	/etc/httpd
Loaded Modules 	core prefork http_core mod_so mod_auth_basic mod_auth_digest mod_authn_file mod_authn_alias mod_authn_anon mod_authn_dbm mod_authn_default mod_authz_host mod_authz_user mod_authz_owner mod_authz_groupfile mod_authz_dbm mod_authz_default util_ldap mod_authnz_ldap mod_include mod_log_config mod_logio mod_env mod_ext_filter mod_mime_magic mod_expires mod_deflate mod_headers mod_usertrack mod_setenvif mod_mime mod_dav mod_status mod_autoindex mod_info mod_dav_fs mod_vhost_alias mod_negotiation mod_dir mod_actions mod_speling mod_userdir mod_alias mod_substitute mod_rewrite mod_proxy mod_proxy_balancer mod_proxy_ftp mod_proxy_http mod_proxy_ajp mod_proxy_connect mod_cache mod_suexec mod_disk_cache mod_cgi mod_version mod_php5

Directive	Local Value	Master Value
engine	1	1
last_modified	0	0
xbithack	0	0

Apache Environment
Variable	Value
HTTP_HOST 	netinfo01.ns.XXX.YYY
HTTP_USER_AGENT 	Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:12.0) Gecko/20100101 Firefox/12.0
HTTP_ACCEPT 	text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
HTTP_ACCEPT_LANGUAGE 	en-us,en;q=0.5
HTTP_ACCEPT_ENCODING 	gzip, deflate
HTTP_DNT 	1
HTTP_CONNECTION 	keep-alive
HTTP_REFERER 	http://netinfo01.ns.XXX.YYY/stats/utilities.php
HTTP_COOKIE 	__utma=135525616.1859736754.1321901558.1337109761.1337617494.12; __utmz=135525616.1337109761.11.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __utmc=135525616; Cacti=s2vai9qlbbtuqbkk9736lckv04; ObSSOCookie=5hMk%2FDpq%2B%2FPKeYbMd5PunmxE94szRXI0rLWgjsGf8c%2F2ndX5l9NakGgShHipniDkpkou3AfkeNc9dAX93IDh%2Bv7XXwa7GYKLGr0rzsrDzI7SfMXM5wauEZ%2FMLbsdHzpf14VU7sckclfiCctnJ%2BwQ9Lbi5livdJ5NWVknsN8In7DrJnrJ2sUCaJLFdFbOfa6%2BJAhLu8Pn%2Bkc%2BBf7AqyQ5cq4%2BcOvaqm5kLtuEnLtTlICNOZbHBSYqZxUnOcvZIWBWZgNY5qTxDnpy54w8tpxyTgYuZHZW1zTZFUoXUCcjiH4%3D
PATH 	/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin
SERVER_SIGNATURE 	<address>Apache/2.2.15 (Red Hat) Server at netinfo01.ns.XXX.YYY Port 80</address>
SERVER_SOFTWARE 	Apache/2.2.15 (Red Hat)
SERVER_NAME 	netinfo01.ns.XXX.YYY
SERVER_ADDR 	AAA.BBB.188.69
SERVER_PORT 	80
REMOTE_ADDR 	AAA.BBB.10.78
DOCUMENT_ROOT 	/var/www/html
SERVER_ADMIN 	root@localhost
SCRIPT_FILENAME 	/var/www/html/stats/utilities.php
REMOTE_PORT 	56841
GATEWAY_INTERFACE 	CGI/1.1
SERVER_PROTOCOL 	HTTP/1.1
REQUEST_METHOD 	GET
QUERY_STRING 	action=view_tech
REQUEST_URI 	/stats/utilities.php?action=view_tech
SCRIPT_NAME 	/stats/utilities.php

HTTP Headers Information
HTTP Request Headers
HTTP Request 	GET /stats/utilities.php?action=view_tech HTTP/1.1
Host 	netinfo01.ns.XXX.YYY
User-Agent 	Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:12.0) Gecko/20100101 Firefox/12.0
Accept 	text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language 	en-us,en;q=0.5
Accept-Encoding 	gzip, deflate
DNT 	1
Connection 	keep-alive
Referer 	http://netinfo01.ns.XXX.YYY/stats/utilities.php
Cookie 	__utma=135525616.1859736754.1321901558.1337109761.1337617494.12; __utmz=135525616.1337109761.11.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __utmc=135525616; Cacti=s2vai9qlbbtuqbkk9736lckv04; ObSSOCookie=5hMk%2FDpq%2B%2FPKeYbMd5PunmxE94szRXI0rLWgjsGf8c%2F2ndX5l9NakGgShHipniDkpkou3AfkeNc9dAX93IDh%2Bv7XXwa7GYKLGr0rzsrDzI7SfMXM5wauEZ%2FMLbsdHzpf14VU7sckclfiCctnJ%2BwQ9Lbi5livdJ5NWVknsN8In7DrJnrJ2sUCaJLFdFbOfa6%2BJAhLu8Pn%2Bkc%2BBf7AqyQ5cq4%2BcOvaqm5kLtuEnLtTlICNOZbHBSYqZxUnOcvZIWBWZgNY5qTxDnpy54w8tpxyTgYuZHZW1zTZFUoXUCcjiH4%3D
HTTP Response Headers
X-Powered-By 	PHP/5.3.3
Expires 	Thu, 19 Nov 1981 08:52:00 GMT
Last-Modified 	Fri, 25 May 2012 21:49:22 GMT
Cache-Control 	no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma 	no-cache
P3P 	CP="CAO PSA OUR"

bz2
BZip2 Support 	Enabled
Stream Wrapper support 	compress.bz2://
Stream Filter support 	bzip2.decompress, bzip2.compress
BZip2 Version 	1.0.5, 10-Dec-2007

calendar
Calendar support 	enabled

Core
PHP Version 	5.3.3

Directive	Local Value	Master Value
allow_call_time_pass_reference	Off	Off
allow_url_fopen	On	On
allow_url_include	Off	Off
always_populate_raw_post_data	Off	Off
arg_separator.input	&	&
arg_separator.output	&	&
asp_tags	Off	Off
auto_append_file	no value	no value
auto_globals_jit	On	On
auto_prepend_file	no value	no value
browscap	no value	no value
default_charset	no value	no value
default_mimetype	text/html	text/html
define_syslog_variables	Off	Off
disable_classes	no value	no value
disable_functions	no value	no value
display_errors	Off	Off
display_startup_errors	Off	Off
doc_root	no value	no value
docref_ext	no value	no value
docref_root	no value	no value
enable_dl	Off	Off
error_append_string	no value	no value
error_log	no value	no value
error_prepend_string	no value	no value
error_reporting	22527	22527
exit_on_timeout	Off	Off
expose_php	On	On
extension_dir	/etc/php.d	/etc/php.d
file_uploads	On	On
highlight.bg	#FFFFFF	#FFFFFF
highlight.comment	#FF8000	#FF8000
highlight.default	#0000BB	#0000BB
highlight.html	#000000	#000000
highlight.keyword	#007700	#007700
highlight.string	#DD0000	#DD0000
html_errors	Off	Off
ignore_repeated_errors	Off	Off
ignore_repeated_source	Off	Off
ignore_user_abort	Off	Off
implicit_flush	Off	Off
include_path	.:/usr/share/pear:/usr/share/php	.:/usr/share/pear:/usr/share/php
log_errors	On	On
log_errors_max_len	1024	1024
magic_quotes_gpc	Off	Off
magic_quotes_runtime	Off	Off
magic_quotes_sybase	Off	Off
mail.add_x_header	On	On
mail.force_extra_parameters	no value	no value
mail.log	no value	no value
max_execution_time	30	30
max_file_uploads	20	20
max_input_nesting_level	64	64
max_input_time	60	60
max_input_vars	1000	1000
memory_limit	128M	128M
open_basedir	no value	no value
output_buffering	4096	4096
output_handler	no value	no value
post_max_size	8M	8M
precision	14	14
realpath_cache_size	16K	16K
realpath_cache_ttl	120	120
register_argc_argv	Off	Off
register_globals	Off	Off
register_long_arrays	Off	Off
report_memleaks	On	On
report_zend_debug	On	On
request_order	GP	GP
safe_mode	Off	Off
safe_mode_exec_dir	no value	no value
safe_mode_gid	Off	Off
safe_mode_include_dir	no value	no value
sendmail_from	no value	no value
sendmail_path	/usr/sbin/sendmail -t -i	/usr/sbin/sendmail -t -i
serialize_precision	100	100
short_open_tag	Off	Off
SMTP	localhost	localhost
smtp_port	25	25
sql.safe_mode	Off	Off
track_errors	Off	Off
unserialize_callback_func	no value	no value
upload_max_filesize	2M	2M
upload_tmp_dir	no value	no value
user_dir	no value	no value
user_ini.cache_ttl	300	300
user_ini.filename	.user.ini	.user.ini
variables_order	GPCS	GPCS
xmlrpc_error_number	0	0
xmlrpc_errors	Off	Off
y2k_compliance	On	On
zend.enable_gc	On	On

ctype
ctype functions 	enabled

curl
cURL support 	enabled
cURL Information 	7.19.7
Age 	3
Features
AsynchDNS 	No
Debug 	No
GSS-Negotiate 	Yes
IDN 	Yes
IPv6 	Yes
Largefile 	Yes
NTLM 	Yes
SPNEGO 	No
SSL 	Yes
SSPI 	No
krb4 	No
libz 	Yes
CharConv 	No
Protocols 	tftp, ftp, telnet, dict, ldap, ldaps, http, file, https, ftps, scp, sftp
Host 	x86_64-redhat-linux-gnu
SSL Version 	NSS/3.12.9.0
ZLib Version 	1.2.3
libSSH Version 	libssh2/1.2.2

date
date/time support 	enabled
"Olson" Timezone Database Version 	0.system
Timezone Database 	internal
Default timezone 	America/New_York

Directive	Local Value	Master Value
date.default_latitude	31.7667	31.7667
date.default_longitude	35.2333	35.2333
date.sunrise_zenith	90.583333	90.583333
date.sunset_zenith	90.583333	90.583333
date.timezone	America/New_York	America/New_York

ereg
Regex Library 	Bundled library enabled

exif
EXIF Support 	enabled
EXIF Version 	1.4 $Id: exif.c 293036 2010-01-03 09:23:27Z sebastian $
Supported EXIF Version 	0220
Supported filetypes 	JPEG,TIFF

Directive	Local Value	Master Value
exif.decode_jis_intel	JIS	JIS
exif.decode_jis_motorola	JIS	JIS
exif.decode_unicode_intel	UCS-2LE	UCS-2LE
exif.decode_unicode_motorola	UCS-2BE	UCS-2BE
exif.encode_jis	no value	no value
exif.encode_unicode	ISO-8859-15	ISO-8859-15

fileinfo
fileinfo support	enabled
version 	1.0.5-dev

filter
Input Validation and Filtering 	enabled
Revision 	$Revision: 298196 $

Directive	Local Value	Master Value
filter.default	unsafe_raw	unsafe_raw
filter.default_flags	no value	no value

ftp
FTP support 	enabled

gd
GD Support 	enabled
GD Version 	bundled (2.0.34 compatible)
FreeType Support 	enabled
FreeType Linkage 	with freetype
FreeType Version 	2.3.11
GIF Read Support 	enabled
GIF Create Support 	enabled
JPEG Support 	enabled
libJPEG Version 	6b
PNG Support 	enabled
libPNG Version 	1.2.46
WBMP Support 	enabled
XPM Support 	enabled
XBM Support 	enabled

Directive	Local Value	Master Value
gd.jpeg_ignore_warning	0	0

gettext
GetText Support 	enabled

gmp
gmp support 	enabled
GMP version 	4.3.1

hash
hash support 	enabled
Hashing Engines 	md2 md4 md5 sha1 sha224 sha256 sha384 sha512 ripemd128 ripemd160 ripemd256 ripemd320 whirlpool tiger128,3 tiger160,3 tiger192,3 tiger128,4 tiger160,4 tiger192,4 snefru snefru256 gost adler32 crc32 crc32b salsa10 salsa20 haval128,3 haval160,3 haval192,3 haval224,3 haval256,3 haval128,4 haval160,4 haval192,4 haval224,4 haval256,4 haval128,5 haval160,5 haval192,5 haval224,5 haval256,5

iconv
iconv support 	enabled
iconv implementation 	glibc
iconv library version 	2.12

Directive	Local Value	Master Value
iconv.input_encoding	ISO-8859-1	ISO-8859-1
iconv.internal_encoding	ISO-8859-1	ISO-8859-1
iconv.output_encoding	ISO-8859-1	ISO-8859-1

json
json support 	enabled
json version 	1.2.1

ldap
LDAP Support 	enabled
RCS Version 	$Id: ldap.c 299434 2010-05-17 20:09:42Z pajoye $
Total Links 	0/unlimited
API Version 	3001
Vendor Name 	OpenLDAP
Vendor Version 	20423
SASL Support 	Enabled

Directive	Local Value	Master Value
ldap.max_links	Unlimited	Unlimited

libxml
libXML support 	active
libXML Compiled Version 	2.7.6
libXML Loaded Version 	20706
libXML streams 	enabled

mysql
MySQL Support	enabled
Active Persistent Links 	1
Active Links 	1
Client API version 	5.1.61
MYSQL_MODULE_TYPE 	external
MYSQL_SOCKET 	/var/lib/mysql/mysql.sock
MYSQL_INCLUDE 	-I/usr/include/mysql
MYSQL_LIBS 	-L/usr/lib64/mysql -lmysqlclient

Directive	Local Value	Master Value
mysql.allow_local_infile	On	On
mysql.allow_persistent	On	On
mysql.connect_timeout	60	60
mysql.default_host	no value	no value
mysql.default_password	no value	no value
mysql.default_port	no value	no value
mysql.default_socket	/var/lib/mysql/mysql.sock	/var/lib/mysql/mysql.sock
mysql.default_user	no value	no value
mysql.max_links	Unlimited	Unlimited
mysql.max_persistent	Unlimited	Unlimited
mysql.trace_mode	Off	Off

mysqli
MysqlI Support	enabled
Client API library version 	5.1.61
Active Persistent Links 	0
Inactive Persistent Links 	0
Active Links 	0
Client API header version 	5.1.52
MYSQLI_SOCKET 	/var/lib/mysql/mysql.sock

Directive	Local Value	Master Value
mysqli.allow_local_infile	On	On
mysqli.allow_persistent	On	On
mysqli.default_host	no value	no value
mysqli.default_port	3306	3306
mysqli.default_pw	no value	no value
mysqli.default_socket	no value	no value
mysqli.default_user	no value	no value
mysqli.max_links	Unlimited	Unlimited
mysqli.max_persistent	Unlimited	Unlimited
mysqli.reconnect	Off	Off

openssl
OpenSSL support 	enabled
OpenSSL Library Version 	OpenSSL 1.0.0-fips 29 Mar 2010
OpenSSL Header Version 	OpenSSL 1.0.0-fips 29 Mar 2010

pcre
PCRE (Perl Compatible Regular Expressions) Support 	enabled
PCRE Library Version 	7.8 2008-09-05

Directive	Local Value	Master Value
pcre.backtrack_limit	100000	100000
pcre.recursion_limit	100000	100000

PDO
PDO support	enabled
PDO drivers 	mysql, sqlite

pdo_mysql
PDO Driver for MySQL	enabled
Client API version 	5.1.61

pdo_sqlite
PDO Driver for SQLite 3.x	enabled
SQLite Library 	3.6.20

Phar
Phar: PHP Archive support	enabled
Phar EXT version 	2.0.1
Phar API version 	1.1.1
SVN revision 	$Revision: 298908 $
Phar-based phar archives 	enabled
Tar-based phar archives 	enabled
ZIP-based phar archives 	enabled
gzip compression 	enabled
bzip2 compression 	enabled
Native OpenSSL support 	enabled

Phar based on pear/PHP_Archive, original concept by Davey Shafik.
Phar fully realized by Gregory Beaver and Marcus Boerger.
Portions of tar implementation Copyright (c) 2003-2009 Tim Kientzle.

Directive	Local Value	Master Value
phar.cache_list	no value	no value
phar.readonly	On	On
phar.require_hash	On	On

Reflection
Reflection	enabled
Version 	$Revision: 300393 $

session
Session Support 	enabled
Registered save handlers 	files user
Registered serializer handlers 	php php_binary

Directive	Local Value	Master Value
session.auto_start	Off	Off
session.bug_compat_42	Off	Off
session.bug_compat_warn	Off	Off
session.cache_expire	180	180
session.cache_limiter	nocache	nocache
session.cookie_domain	no value	no value
session.cookie_httponly	Off	Off
session.cookie_lifetime	0	0
session.cookie_path	/	/
session.cookie_secure	Off	Off
session.entropy_file	no value	no value
session.entropy_length	0	0
session.gc_divisor	1000	1000
session.gc_maxlifetime	1440	1440
session.gc_probability	1	1
session.hash_bits_per_character	5	5
session.hash_function	0	0
session.name	Cacti	PHPSESSID
session.referer_check	no value	no value
session.save_handler	files	files
session.save_path	/var/lib/php/session	/var/lib/php/session
session.serialize_handler	php	php
session.use_cookies	On	On
session.use_only_cookies	On	On
session.use_trans_sid	0	0

shmop
shmop support 	enabled

SimpleXML
Simplexml support	enabled
Revision 	$Revision: 299424 $
Schema support 	enabled

snmp
NET-SNMP Support 	enabled
NET-SNMP Version 	5.5

sockets
Sockets Support 	enabled

SPL
SPL support	enabled
Interfaces 	Countable, OuterIterator, RecursiveIterator, SeekableIterator, SplObserver, SplSubject
Classes 	AppendIterator, ArrayIterator, ArrayObject, BadFunctionCallException, BadMethodCallException, CachingIterator, DirectoryIterator, DomainException, EmptyIterator, FilesystemIterator, FilterIterator, GlobIterator, InfiniteIterator, InvalidArgumentException, IteratorIterator, LengthException, LimitIterator, LogicException, MultipleIterator, NoRewindIterator, OutOfBoundsException, OutOfRangeException, OverflowException, ParentIterator, RangeException, RecursiveArrayIterator, RecursiveCachingIterator, RecursiveDirectoryIterator, RecursiveFilterIterator, RecursiveIteratorIterator, RecursiveRegexIterator, RecursiveTreeIterator, RegexIterator, RuntimeException, SplDoublyLinkedList, SplFileInfo, SplFileObject, SplFixedArray, SplHeap, SplMinHeap, SplMaxHeap, SplObjectStorage, SplPriorityQueue, SplQueue, SplStack, SplTempFileObject, UnderflowException, UnexpectedValueException

sqlite3
SQLite3 support	enabled
SQLite3 module version 	0.7-dev
SQLite Library 	3.6.20

Directive	Local Value	Master Value
sqlite3.extension_dir	no value	no value

standard
Dynamic Library Support 	enabled
Path to sendmail 	/usr/sbin/sendmail -t -i

Directive	Local Value	Master Value
assert.active	1	1
assert.bail	0	0
assert.callback	no value	no value
assert.quiet_eval	0	0
assert.warning	1	1
auto_detect_line_endings	0	0
default_socket_timeout	60	60
safe_mode_allowed_env_vars	PHP_	PHP_
safe_mode_protected_env_vars	LD_LIBRARY_PATH	LD_LIBRARY_PATH
url_rewriter.tags	a=href,area=href,frame=src,input=src,form=fakeentry	a=href,area=href,frame=src,input=src,form=fakeentry
user_agent	no value	no value

tokenizer
Tokenizer Support 	enabled

xml
XML Support 	active
XML Namespace Support 	active
libxml2 Version 	2.7.6

zip
Zip 	enabled
Extension Version 	$Id: php_zip.c 300470 2010-06-15 18:48:33Z pajoye $
Zip version 	1.9.1
Libzip version 	0.9.0

zlib
ZLib Support 	enabled
Stream Wrapper support 	compress.zlib://
Stream Filter support 	zlib.inflate, zlib.deflate
Compiled Version 	1.2.3
Linked Version 	1.2.3

Directive	Local Value	Master Value
zlib.output_compression	Off	Off
zlib.output_compression_level	-1	-1
zlib.output_handler	no value	no value

Additional Modules
Module Name

sheriffjms · Post by **sheriffjms** » Tue May 29, 2012 8:41 am

My original assumption that the data values were still being stored, just not graphed, was wrong. I checked several graphs after the long weekend, and not that the last good data has rolled out of shortest-term view, they all show NaN for all data values.

While this seems like a permissions problem, all of the RRD files are owned by cactiuser, all of the directories where the RRDs are stored are owned by cactiuser. All of the processes that are for cactiuser (output of ps aux) are reported as cactiuser's UID, rather than as 'cactiuser', however I'm guessing that's more an issue of "cactiuser" being longer than 8 characters.

When I look at the graph debug mode for any of the affected graphs, rrdtool reports everything is OK.

Code: Select all

	RRDTool Command:

/usr/bin/rrdtool graph - \
--imgformat=PNG \
--start=-345600 \
--end=-300 \
--title='bs-core-1.gw - TenGigabitEthernet3/12 - bs15e-a-1.c3750 Te1/0/2 - UOP: 20110623.0309 - Traffic' \
--rigid \
--base=1000 \
--height=200 \
--width=600 \
--alt-autoscale \
--vertical-label='bits per second' \
--slope-mode \
--font TITLE:7: \
--font AXIS:7: \
--font LEGEND:8: \
--font UNIT:7: \
DEF:a="/var/www/html/stats/rra/30/165.rrd":"traffic_in":MAX \
DEF:b="/var/www/html/stats/rra/30/165.rrd":"traffic_out":MAX \
CDEF:cdefa='a,8,*' \
CDEF:cdefe='b,-8,*' \
CDEF:cdeff='b,8,*' \
AREA:cdefa#00FF00FF:"Inbound"  \
GPRINT:cdefa:LAST:" Current\:%8.2lf %s"  \
GPRINT:cdefa:AVERAGE:"Average\:%8.2lf %s"  \
GPRINT:cdefa:MAX:"Maximum\:%8.2lf %s\n"  \
AREA:cdefe#6DC8FEFF:"Outbound"  \
GPRINT:cdeff:LAST:"Current\:%8.2lf %s"  \
GPRINT:cdeff:AVERAGE:"Average\:%8.2lf %s"  \
GPRINT:cdeff:MAX:"Maximum\:%8.2lf %s\n" 

RRDTool Says:

OK

sheriffjms · Post by **sheriffjms** » Tue May 29, 2012 6:22 pm

Doing some more digging, I found the following errors would pop up occasionally in the logs:

Code: Select all

05/29/2012 05:05:01 PM - SPINE: Poller[0] Host[52] ERROR: Problem executing POPEN [beg39-10kva-a.ups.net.pitt.edu]: 'php /var/www/html/stats/scripts/mib-ii_ups.php outputcurrent beg39-10kva-a.ups.net.XXX.YYY SNMPCOMMUNITY 1'
05/29/2012 05:05:01 PM - SPINE: Poller[0] ERROR: SCRIPT Cound not fork. Unknown Reason nft_popen.c
05/29/2012 05:05:01 PM - SPINE: Poller[0] ERROR: SCRIPT Cound not fork. Out of Memory nft_popen.c
05/29/2012 05:05:01 PM - SPINE: Poller[0] ERROR: SCRIPT: Cound not fork. Out of Resources nft_popen.c

So spine can't fork a process to run the PHP script above, which is what gets output current stats from some of our UPSs. The question is if it's a case of spine not being able to fork because of a system/kernel limit, or some limit that's defined in PHP.

sheriffjms · Post by **sheriffjms** » Thu May 31, 2012 9:58 am

The reason for spine complaining about not having the resources to fork another process is because it's running into a limit for the number of processes that can be opened by users, as defined in /etc/security/limits.d/90-nproc.conf:

Code: Select all

# Default limit for number of user's processes to prevent
# accidental fork bombs.
# See rhbz #432903 for reasoning.

*          soft    nproc     1024

I've turned the logging level up to debugging, and based on some information I saw there, I've changed the permissions on the spine binary to be setuid root, group cacti user, no world access. According to the last poller run, spine is still complaining about not having resources to start a thread.

I looked at the spine processes that were hanging around, and some of them were started in April, and they never got killed off. I went through and killed all of those old processes, and now my RRDs and graphs are updating again. That's good, but the question of what caused those spine processes to logjam in the first place is still open. Since killing off those old spine processes, I saw one that has been hanging around for longer than 5 minutes. An strace of that process doesn't show anything too interesting - it's just waiting for some action to happen. I'm trying to corroborate a spine process lasting <5 minutes to any relevant events in the Cacti log to try to find out why that process is still hanging around, and possibly get an strace of that process before it gets to that wait state.

strace output:

Code: Select all

[root@netinfo01 bin]# strace -p 7537
Process 7537 attached - interrupt to quit
futex(0x3cbef9ce80, FUTEX_WAIT_PRIVATE, 2, NULL

sheriffjms · Post by **sheriffjms** » Tue Jun 05, 2012 6:25 pm

The number of spine processes that were not closing out properly eventually exceeded the limit that was set for the number of processes that could be open by cactiuser at any one time. A short-term fix was hacking up a quick cron job to nuke any spine processes that were hanging out in space. The more permanent fix was to upgrade to cacti/spine 0.8.8a, which appears to have fixed the issue. Once I was able to get an strace of a cacti process before it hung itself (and interestingly enough, the spine process that ran 5 minutes before it...), it became more clear that it was tripping on a known bug.

For posterity, here is the strace output I saw from the spine process that hung itself:

Code: Select all

nanosleep({0, 50000}, NULL)             = 0
nanosleep({0, 50000}, NULL)             = 0
nanosleep({0, 50000}, NULL)             = 0
poll([{fd=5, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
write(5, "9\0\0\0\3replace into settings (name"..., 61) = 61
read(5, "\7\0\0\1\0\2\0\2\0\0\0", 16384) = 11
poll([{fd=5, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
write(5, "D\0\0\0\3UPDATE poller_time SET end_"..., 72) = 72
read(5, "0\0\0\1\0\1\0\"\0\0\0(Rows matched: 1  Cha"..., 16384) = 52
write(5, "\1\0\0\0\1", 5)               = 5
shutdown(5, 2 /* send and receive */)   = 0
close(5)                                = 0
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
rt_sigaction(SIGSEGV, {SIG_DFL, [SEGV], SA_RESTORER|SA_RESTART, 0x3cbec32900}, {0x40f010, [SEGV], SA_RESTORER|SA_RESTART, 0x3cbec32900}, 8) = 0
stat("/var/www/html/stats/log/cacti.log", {st_mode=S_IFREG|0644, st_size=821132244, ...}) = 0
futex(0x3cbef9ce80, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
+++ killed by SIGKILL +++

And from the spine process from the previous poller run:

Code: Select all

nanosleep({0, 50000}, NULL)             = 0
nanosleep({0, 50000}, NULL)             = 0
nanosleep({0, 50000}, NULL)             = 0
poll([{fd=5, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
write(5, "9\0\0\0\3replace into settings (name"..., 61) = 61
read(5, "\7\0\0\1\0\2\0\2\0\0\0", 16384) = 11
poll([{fd=5, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
write(5, "F\0\0\0\3UPDATE poller_time SET end_"..., 74) = 74
read(5, "0\0\0\1\0\1\0\"\0\0\0(Rows matched: 1  Cha"..., 16384) = 52
write(5, "\1\0\0\0\1", 5)               = 5
shutdown(5, 2 /* send and receive */)   = 0
close(5)                                = 0
stat("/var/www/html/stats/log/cacti.log", {st_mode=S_IFREG|0644, st_size=820276244, ...}) = 0
open("/var/www/html/stats/log/cacti.log", O_WRONLY|O_CREAT|O_APPEND, 0666) = 5
fstat(5, {st_mode=S_IFREG|0644, st_size=820276244, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f767572a000
fstat(5, {st_mode=S_IFREG|0644, st_size=820276244, ...}) = 0
lseek(5, 820276244, SEEK_SET)           = 820276244
write(5, "05/31/2012 04:25:43 PM - SPINE: "..., 82) = 82
close(5)                                = 0
munmap(0x7f767572a000, 4096)            = 0
rt_sigaction(SIGINT, NULL, {SIG_IGN, [INT], SA_RESTORER|SA_RESTART, 0x3cbec32900}, 8) = 0
rt_sigaction(SIGSEGV, NULL, {0x40f010, [SEGV], SA_RESTORER|SA_RESTART, 0x3cbec32900}, 8) = 0
rt_sigaction(SIGSEGV, {SIG_DFL, [SEGV], SA_RESTORER|SA_RESTART, 0x3cbf40f4a0}, NULL, 8) = 0
rt_sigaction(SIGBUS, NULL, {0x40f010, [BUS], SA_RESTORER|SA_RESTART, 0x3cbec32900}, 8) = 0
rt_sigaction(SIGBUS, {SIG_DFL, [BUS], SA_RESTORER|SA_RESTART, 0x3cbf40f4a0}, NULL, 8) = 0
rt_sigaction(SIGFPE, NULL, {0x40f010, [FPE], SA_RESTORER|SA_RESTART, 0x3cbec32900}, 8) = 0
rt_sigaction(SIGFPE, {SIG_DFL, [FPE], SA_RESTORER|SA_RESTART, 0x3cbf40f4a0}, NULL, 8) = 0
rt_sigaction(SIGQUIT, NULL, {SIG_IGN, [QUIT], SA_RESTORER|SA_RESTART, 0x3cbec32900}, 8) = 0
rt_sigaction(SIGINT, {SIG_DFL, [INT], SA_RESTORER|SA_RESTART, 0x3cbec32900}, {SIG_IGN, [INT], SA_RESTORER|SA_RESTART, 0x3cbec32900}, 8) = 0
rt_sigaction(SIGINT, {SIG_IGN, [INT], SA_RESTORER|SA_RESTART, 0x3cbec32900}, {SIG_DFL, [INT], SA_RESTORER|SA_RESTART, 0x3cbec32900}, 8) = 0
rt_sigaction(SIGSEGV, {SIG_DFL, [SEGV], SA_RESTORER|SA_RESTART, 0x3cbec32900}, {SIG_DFL, [SEGV], SA_RESTORER|SA_RESTART, 0x3cbf40f4a0}, 8) = 0
rt_sigaction(SIGSEGV, {SIG_DFL, [SEGV], SA_RESTORER|SA_RESTART, 0x3cbec32900}, {SIG_DFL, [SEGV], SA_RESTORER|SA_RESTART, 0x3cbec32900}, 8) = 0
rt_sigaction(SIGBUS, {SIG_DFL, [BUS], SA_RESTORER|SA_RESTART, 0x3cbec32900}, {SIG_DFL, [BUS], SA_RESTORER|SA_RESTART, 0x3cbf40f4a0}, 8) = 0
rt_sigaction(SIGBUS, {SIG_DFL, [BUS], SA_RESTORER|SA_RESTART, 0x3cbec32900}, {SIG_DFL, [BUS], SA_RESTORER|SA_RESTART, 0x3cbec32900}, 8) = 0
rt_sigaction(SIGFPE, {SIG_DFL, [FPE], SA_RESTORER|SA_RESTART, 0x3cbec32900}, {SIG_DFL, [FPE], SA_RESTORER|SA_RESTART, 0x3cbf40f4a0}, 8) = 0
rt_sigaction(SIGFPE, {SIG_DFL, [FPE], SA_RESTORER|SA_RESTART, 0x3cbec32900}, {SIG_DFL, [FPE], SA_RESTORER|SA_RESTART, 0x3cbec32900}, 8) = 0
rt_sigaction(SIGQUIT, {SIG_DFL, [QUIT], SA_RESTORER|SA_RESTART, 0x3cbec32900}, {SIG_IGN, [QUIT], SA_RESTORER|SA_RESTART, 0x3cbec32900}, 8) = 0
rt_sigaction(SIGQUIT, {SIG_IGN, [QUIT], SA_RESTORER|SA_RESTART, 0x3cbec32900}, {SIG_DFL, [QUIT], SA_RESTORER|SA_RESTART, 0x3cbec32900}, 8) = 0
exit_group(0)                           = ?

Cacti

[SOLVED] graphs have stopped graphing, spine segfaults

[SOLVED] graphs have stopped graphing, spine segfaults

Re: graphs have stopped graphing, spine segfaults

Re: graphs have stopped graphing, spine segfaults

Re: graphs have stopped graphing, spine segfaults

Re: graphs have stopped graphing, spine segfaults

[SOLVED] Re: graphs have stopped graphing, spine segfaults

Who is online