When running with spine i get lots of timeouts..
Code: Select all
SPINE: Poller[0] ERROR: Spine Timed Out While Processing Hosts Internal
Code: Select all
POLLER: Poller[0] WARNING: There are ’1′ detected as overrunning a polling process, please investigate
I have managed to find out that when i decrease the server timeout i get spine to more often fit in 300s but still lets say it is 3 out of 10.
The issue looks like a network lag (this is a WAN WiMAX network and at certain points 100ms ping and 2-5s SNMP is normal - that is 80% of hosts).
Mostly because of hosts currently down the timeout goes in summary beyond 300s.
cmd.php manages to poll under 200s and this works rather nice, but not always.
cmd.php also happens to leave holes in graphs but not as often as spine and with no errors in log. 90% of my data is SNMP based, only one short script which also reads snmp, a few indexed queries.
So my question is - how to tune SPINE/CACTI for a network with lots of delay to work fine with ~500 hosts ~10k data sources (this is what i will face soon).
(now it is ~130 hosts and ~800 data sources) or where is the problem in this setup ?
My setup:
boost installed, running on a virtual machine on Hyper-V,
for now it is 4cores and 3gig ram but plenty more available if needed
Cacti Version: 0.8.8a
Spine Version: 0.8.8a
Boost version: 5.1.1
Cacti OS: unix
SNMP Version: NET-SNMP version: 5.7.2
RRDTool Version: RRDTool 1.4.x
Hosts: 137
Graphs: 446
Data Sources: Script/Command: 24
SNMP: 233
SNMP Query: 279
Total: 536
Poller Information
Interval: 300
Type cmd.php
cron: 300
Items
Action[0]: 791
Action[1]: 24
Total: 815
Concurrent Processes 4
Max Threads 20
PHP Servers 5
Script Timeout 120
Max OID 10
ping timeout: 1500
snmp timeout: 5000
downed detection: ICMP ping or snmp
Last Run Statistics
Time:170.8850 Method:cmd.php Processes:4 Threads:N/A Hosts:138 HostsPerProcess:35 DataSources:815 RRDsProcessed:0
PHP Information
PHP Version: 5.4.7
PHP OS: FreeBSD
PHP uname: FreeBSD bragi.mgmt.local 9.0-RELEASE FreeBSD 9.0-RELEASE #0: Tue Jan 3 07:46:30 UTC 2012 root@farrell.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64
PHP SNMP: Installed
max_execution_time: 60
memory_limit: 1024M
I'd be really grateful if anyone could help solve this issue.