Hi All,
We've got a problem with our Windows Cacti system that one of the four spine processes will not close after a poll has finished. This seems to mimick the issue reported in Spine 088 however we already upgraded the system to 088a. The issue is slightly annoying as it occurs on seemingly random intervals (e.g. 3-8 days).
The solution is simple - wait until the scheduled poll is finished and then kill the left over process but I don't want to have to do that continualy - i'd rather fix the issue. The left over spine process consumes CPU cycles and will eventually cause an impact to the normal polling as it increases the average polling time from 75-85 seconds to 85-100.
The background is that we upgraded to spine 088a (Cacti is and remains 0.8.7e) to try and solve a problem with the graphs dropping out across the board - unfortunately this issue (spine processes staying running) started after the upgrade. There was a mistake I made with the spine config after setting it up to use 088a, as I hadn't included the "preg" option so I had to clear out the poller cache to fix that.
The issue with the graphs dropping out appears to have been due to the balancing of the VMs between the underlying ESX servers not occuring and memory contention occuring. There was also an issue with the windows scheduled task time being set to less than the polling interval (e.g. it was set to stop the task after 4 mins) when Cacti will actually stop it for us at (polling time - 2 seconds). I'd suggest and are happy to update the Windows setup doco to state that this as well as some other tips/tricks if someone can point me in the direction on what to do.
Grateful if:
- The devs can confirm that the process not closing fix that was observed in 088 was rolled into the Windows spine for 088a - I'm assuming so but since the problem seems almost the same, just though i'd ask. Our spine 088a is using the (already) compiled spine.exe that came with the 088a windows install so are happy to recompile if recommended. It's definately 088a we're using (checked via spine --help) and in the Spine Poller File Path folder.
- People can advise what further troubleshooting I can/should do
- I've already had a look through the logs and have enabled debug for one run (both when the additional spine process is running and when it is not). I'm still going through the logs and they don't point to anything obvious.
- I've already checked the Windows Event Logs, reviewed other CPU, Mem, Disk Scheduled / other tasks and logs to see if anything is obvious but nothing came up until the error occurs
- I've had a look at the mysql.ini file for the number of connections and its currently set to 200 which seems to be within the recommended amount based on our number of concurrent processes / threads. I did notice that someone else had a similar problem that was fixed by dramatically increasing this
spine config
DB_Host 127.0.0.1
DB_Database cacti
DB_User <user>
DB_Pass <password>
DB_Port 3306
DB_PreG 1
Cacti Technical Support Info
General Information
Date Mon, 18 Jun 2012 09:28:55 +1000
Cacti Version 0.8.7e
Cacti OS win32
SNMP Version NET-SNMP version: 5.4.2.1
RRDTool Version RRDTool 1.2.x
Hosts 1657
Graphs 11105
Data Sources SNMP: 7821
SNMP Query: 7622
Script - Script Server (PHP): 9
Total: 15452
Poller Information
Interval 300
Type spine
Items Action[0]: 17232
Action[2]: 1
Total: 17233
Concurrent Processes 4
Max Threads 10
PHP Servers 5
Script Timeout 30
Max OID 60
Last Run Statistics Time:78.6240 Method:spine Processes:4 Threads:10 Hosts:1623 HostsPerProcess:406 DataSources:17233 RRDsProcessed:10349
PHP Information
PHP Version 5.2.8
PHP OS WINNT
PHP uname Windows NT CORE-MG13L 5.2 build 3790
PHP SNMP Installed
max_execution_time 30
memory_limit 488M
MySQL Table Information
Name Rows Engine Collation Check Status
cdef 29 MyISAM latin1_swedish_ci OK
cdef_items 51 MyISAM latin1_swedish_ci OK
colors 108 MyISAM latin1_swedish_ci OK
data_input 8 MyISAM latin1_swedish_ci OK
data_input_data 173417 MyISAM latin1_swedish_ci OK
data_input_fields 57 MyISAM latin1_swedish_ci OK
data_local 15453 MyISAM latin1_swedish_ci OK
data_template 136 MyISAM latin1_swedish_ci OK
data_template_data 15589 MyISAM latin1_swedish_ci OK
data_template_data_rra 74064 MyISAM latin1_swedish_ci OK
data_template_rrd 23061 MyISAM latin1_swedish_ci OK
graph_local 11105 MyISAM latin1_swedish_ci OK
graph_template_input 269 MyISAM latin1_swedish_ci OK
graph_template_input_defs 890 MyISAM latin1_swedish_ci OK
graph_templates 86 MyISAM latin1_swedish_ci OK
graph_templates_gprint 6 MyISAM latin1_swedish_ci OK
graph_templates_graph 11189 MyISAM latin1_swedish_ci OK
graph_templates_item 101570 MyISAM latin1_swedish_ci OK
graph_tree 13 MyISAM latin1_swedish_ci OK
graph_tree_items 1859 MyISAM latin1_swedish_ci OK
host 1657 MyISAM latin1_swedish_ci OK
host_graph 3893 MyISAM latin1_swedish_ci OK
host_snmp_cache 695185 MyISAM latin1_swedish_ci OK
host_snmp_query 1385 MyISAM latin1_swedish_ci OK
host_template 14 MyISAM latin1_swedish_ci OK
host_template_graph 38 MyISAM latin1_swedish_ci OK
host_template_snmp_query 27 MyISAM latin1_swedish_ci OK
plugin_config 5 MyISAM latin1_swedish_ci OK
plugin_db_changes 0 MyISAM latin1_swedish_ci OK
plugin_fix64bit 0 MyISAM latin1_swedish_ci OK
plugin_hooks 24 MyISAM latin1_swedish_ci OK
plugin_realms 4 MyISAM latin1_swedish_ci OK
plugin_rrdclean 138 MyISAM latin1_swedish_ci OK
plugin_rrdclean_action 0 MyISAM latin1_swedish_ci OK
poller 0 MyISAM latin1_swedish_ci OK
poller_command 0 MyISAM latin1_swedish_ci OK
poller_item 17238 MyISAM latin1_swedish_ci OK
poller_output 315 MyISAM latin1_swedish_ci OK
poller_reindex 1356 MyISAM latin1_swedish_ci OK
poller_time 2 MyISAM latin1_swedish_ci OK
rra 5 MyISAM latin1_swedish_ci OK
rra_cf 18 MyISAM latin1_swedish_ci OK
settings 145 MyISAM latin1_swedish_ci OK
settings_graphs 140 MyISAM latin1_swedish_ci OK
settings_tree 0 MyISAM latin1_swedish_ci OK
snmp_query 19 MyISAM latin1_swedish_ci OK
snmp_query_graph 44 MyISAM latin1_swedish_ci OK
snmp_query_graph_rrd 124 MyISAM latin1_swedish_ci OK
snmp_query_graph_rrd_sv 67 MyISAM latin1_swedish_ci OK
snmp_query_graph_sv 57 MyISAM latin1_swedish_ci OK
user_auth 92 MyISAM latin1_swedish_ci OK
user_auth_perms 1437 MyISAM latin1_swedish_ci OK
user_auth_realm 421 MyISAM latin1_swedish_ci OK
user_log 4122 MyISAM latin1_swedish_ci OK
version 1 MyISAM latin1_swedish_ci OK
MySQL ver - 5.0.58 for Win32
Max_Connections - 200
Web Server -IIS 6 / Windows 2003 R2
Spine 088a - Processes staying open occasionally after poll
Moderators: Developers, Moderators
Re: Spine 088a - Processes staying open occasionally after p
What scripts are stuck running? Tools like sysinternals process explorer can provide this info. Sounds like you might need to modify the script(s) and include timeout logic.
| Scripts: Monitor processes | RFC1213 MIB | DOCSIS Stats | Dell PowerEdge | Speedfan | APC UPS | DOCSIS CMTS | 3ware | Motorola Canopy |
| Guides: Windows Install | [HOWTO] Debug Windows NTFS permission problems |
| Tools: Windows All-in-one Installer |
Re: Spine 088a - Processes staying open occasionally after p
Hi BSOD,
Thanks for getting back to me. I'm not quite sure how to check what scripts are not completing as when it occurs (at a seemingly random interval), the rrd processed count does not change at all - e.g. in the poll before the spine process stays open, its around 10379. We expect some movement in the rrd's processed due to the size and connection types of our remote sites.
Short of running debug continually (which increases the polling time in itself), i'm not sure how to check for this, but grateful any suggestions. What about putting Cacti polling into Medium instead?
I am occasionally getting Poller Output tables not empty (refer below), but from looking at the Poller Log, this seems to happen after seeing a SNMP polling time on a device one or two intervals.
06/19/2012 10:10:00 AM - POLLER: Poller[0] WARNING: Poller Output Table not Empty. Issues Found: 12, Data Sources: traffic_out(DS[2995] Graphs['<router> - Traffic - <interface> - |query_ifAlias|']), traffic_out(DS[2996] Graphs['<router> - Traffic - <interface> - Link to <router> <interface>']), traffic_out(DS[2997] Graphs['<router> - Traffic - <interface> - Link to <interface>']), traffic_out(DS[2998] Graphs['<router> - Traffic - <interface> - Link to <interface>']), traffic_out(DS[3001] Graphs['<router> - Traffic - <interface>']), traffic_out(DS[3003] Graphs['<router> - Traffic - <interface>']), traffic_out(DS[3004] Graphs['<router> - Traffic - <interface>']), traffic_out(DS[3005] Graphs['<router> - Traffic - <interface>']), traffic_out(DS[3006] Graphs['<router> - Traffic - <interface>']), cpu(DS[16918] Graphs['<switch> - 0.0.0.0 - <switch> - CPU Usage']), cpu10min(DS[16918] Graphs['<switch> - 0.0.0.0 - <switch> - CPU Usage']), cpu1hour(DS[16918] Graphs['<switch> - 0.0.0.0 - <switch> - CPU Usage'])
Thanks for getting back to me. I'm not quite sure how to check what scripts are not completing as when it occurs (at a seemingly random interval), the rrd processed count does not change at all - e.g. in the poll before the spine process stays open, its around 10379. We expect some movement in the rrd's processed due to the size and connection types of our remote sites.
Short of running debug continually (which increases the polling time in itself), i'm not sure how to check for this, but grateful any suggestions. What about putting Cacti polling into Medium instead?
I am occasionally getting Poller Output tables not empty (refer below), but from looking at the Poller Log, this seems to happen after seeing a SNMP polling time on a device one or two intervals.
06/19/2012 10:10:00 AM - POLLER: Poller[0] WARNING: Poller Output Table not Empty. Issues Found: 12, Data Sources: traffic_out(DS[2995] Graphs['<router> - Traffic - <interface> - |query_ifAlias|']), traffic_out(DS[2996] Graphs['<router> - Traffic - <interface> - Link to <router> <interface>']), traffic_out(DS[2997] Graphs['<router> - Traffic - <interface> - Link to <interface>']), traffic_out(DS[2998] Graphs['<router> - Traffic - <interface> - Link to <interface>']), traffic_out(DS[3001] Graphs['<router> - Traffic - <interface>']), traffic_out(DS[3003] Graphs['<router> - Traffic - <interface>']), traffic_out(DS[3004] Graphs['<router> - Traffic - <interface>']), traffic_out(DS[3005] Graphs['<router> - Traffic - <interface>']), traffic_out(DS[3006] Graphs['<router> - Traffic - <interface>']), cpu(DS[16918] Graphs['<switch> - 0.0.0.0 - <switch> - CPU Usage']), cpu10min(DS[16918] Graphs['<switch> - 0.0.0.0 - <switch> - CPU Usage']), cpu1hour(DS[16918] Graphs['<switch> - 0.0.0.0 - <switch> - CPU Usage'])
Re: Spine 088a - Processes staying open occasionally after p
Any suggestions for troubleshooting? The problem is the issue only seems to occur randomly.
I'm going through and cleaning up dead graphs and doing other misc tasks which seems to be making it run better but I don't know if its been fixed.
Is there any issue with using the pre-compiled exe for 088a spine for Windows with a much earlier ver 0.8.7e (with the pre-G option ocnfigured) ? This should have the fix for the bug in 088 for the spine processes not closing which from what i've read seems to mimicks this issue to a much lesser extent.
I'm going through and cleaning up dead graphs and doing other misc tasks which seems to be making it run better but I don't know if its been fixed.
Is there any issue with using the pre-compiled exe for 088a spine for Windows with a much earlier ver 0.8.7e (with the pre-G option ocnfigured) ? This should have the fix for the bug in 088 for the spine processes not closing which from what i've read seems to mimicks this issue to a much lesser extent.
- gandalf
- Developer
- Posts: 22383
- Joined: Thu Dec 02, 2004 2:46 am
- Location: Muenster, Germany
- Contact:
Re: Spine 088a - Processes staying open occasionally after p
Run spine with verbosity=3 to get a per host poller runtime value printed in cacti.log. That should help
R.
R.
Who is online
Users browsing this forum: No registered users and 1 guest