I just moved Cacti from a physical server to a vmware machine. Its running good, I've got 8*2Ghz, 12GB RAM and some SAN-disk behind that. I'm running 64bit FreeBSD9-RELEASE #0. I'm also running rrdtool-1.4.5, php5-5.3.10_1, mysql-server-5.5.22 and apache-2.2.22_5.
Boost version is from SVN, rev2059. Cacti is from official download without any official patches (don't know if any are released yet) and Spine is the latest from SVN aswell: r7125.
My problem is that sometimes, when boost runs, poller.php seems to collide and falls into "disk wait" state, and the running rrdtool process (not sure if its from boost or poller.php) is still running in an "idle" state. Occasionally (this has happened 2-3times now since the machine started 2-3weeks ago) every boost run causes this behavior and increases the process count. The boost process then quits nicely and disappears.
I cant really tell if there is a certain host or script that is running slow and causing this, im not elated at running a full debug of ~600 hosts and sifting through it, especially since this doesn't occur constantly .
Any help or ideas would be appreciated!
Apparently some poller.php processes manage to quit themselves after a while, the poller.php below had a friend running alongside it yesterday before i went home. But this process started yesterday, im not sure why its keeping its today-timestamp.
Code: Select all
[david@cacti01 ~]$ ps aux
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
www 20008 0.0 0.1 280512 10360 ?? Ds 9:00AM 0:07.37 /usr/local/bin/php /usr/local/share/cacti/poller.php
www 20056 0.0 0.0 82600 2480 ?? I 9:00AM 0:00.02 /usr/local/bin/rrdtool -
Code: Select all
04/24/2012 09:01:11 AM - CMDPHP: Poller[0] ERROR: SQL Assoc Failed!, Error:'2006', SQL:"SELECT 1 AS id, ph.name, ph.file, ph.function FROM plugin_hooks AS ph LEFT JOIN plugin_config AS pc ON pc.directory=ph.name WHERE ph.status = 1 AND hook = 'poller_bottom' AND ph.name IN ('settings', 'boost', 'dsstats') UNION SELECT pc.id, ph.name, ph.file, ph.function FROM plugin_hooks AS ph LEFT JOIN plugin_config AS pc ON pc.directory=ph.name WHERE ph.status = 1 AND hook = 'poller_bottom' AND ph.name NOT IN ('settings', 'boost', 'dsstats') ORDER BY id ASC"
04/24/2012 09:01:11 AM - CMDPHP: Poller[0] ERROR: A DB Exec Failed!, Error:'2006', SQL:"REPLACE INTO settings (name,value) VALUES ('stats_recache','RecacheTime:0.0 HostsRecached:0')'
04/24/2012 09:01:11 AM - CMDPHP: Poller[0] ERROR: SQL Cell Failed!, Error:'2006', SQL:"SELECT COUNT(*) FROM poller_command"
04/24/2012 09:01:11 AM - CMDPHP: Poller[0] ERROR: A DB Exec Failed!, Error:'2006', SQL:"REPLACE INTO settings (name,value) VALUES ('stats_poller','Time:86471.2080 Method:spine Processes:8 Threads:10 Hosts:467 HostsPerProcess:59 DataSources:69193 RRDsProcessed:0')'
04/24/2012 09:01:11 AM - SYSTEM STATS: Time:86471.2080 Method:spine Processes:8 Threads:10 Hosts:467 HostsPerProcess:59 DataSources:69193 RRDsProcessed:0
04/24/2012 09:01:11 AM - POLLER: Poller[0] Maximum runtime of 298 seconds exceeded. Exiting.
04/24/2012 09:01:11 AM - CMDPHP: Poller[0] ERROR: SQL Assoc Failed!, Error:'2006', SQL:"select poller_output.output, poller_output.time, UNIX_TIMESTAMP(poller_output.time) as unix_time, poller_output.local_data_id, poller_item.rrd_path, poller_item.rrd_name, poller_item.rrd_num from (poller_output,poller_item) where (poller_output.local_data_id=poller_item.local_data_id and poller_output.rrd_name=poller_item.rrd_name) LIMIT 10000"