Hello,
From the poller information I can see that I am nearly reaching the maximum time of 300s for the poller to complete all the tasks (lots of equipment/interfaces checked over slow Wan links).
Is there a way to check what specific tasks or checks are taking time so that I can try to optimize them. I am pretty sure I have room for optimization but I don't know where to look at ...
Technical Support
General Information
Date Wed, 18 Aug 2010 12:49:50 +0200
Cacti Version 0.8.7e
Cacti OS unix
SNMP Version NET-SNMP version: 5.3.1
RRDTool Version RRDTool 1.2.x
Hosts 916
Graphs 5708
Data Sources Script/Command: 3506
SNMP: 1339
SNMP Query: 2505
Script Query: 2
Total: 7352
Poller Information
Interval 300
Type spine
Items Action[0]: 6010
Action[1]: 3305
Total: 9315
Concurrent Processes 5
Max Threads 25
PHP Servers 10
Script Timeout 20
Max OID 40
Last Run Statistics Time:295.3851 Method:spine Processes:5 Threads:25 Hosts:801 HostsPerProcess:161 DataSources:9367 RRDsProcessed:6630
PHP Information
PHP Version 5.1.6
PHP OS Linux
PHP uname Linux vrhfrndc01cac01 2.6.18-128.7.1.el5 #1 SMP Wed Aug 19 04:00:44 EDT 2009 i686
PHP SNMP Installed
max_execution_time 60
memory_limit 256
Some ideas to try to find what are the main time-consuming checks (being slow or in timeout)?
Thanks.
poller reaching time limit : where to check the causes ?
Moderators: Developers, Moderators
poller reaching time limit : where to check the causes ?
Last edited by Canto on Wed Aug 18, 2010 3:18 pm, edited 1 time in total.
We have had this problem numerous times. We poll 287 devices with about 3000 data sources. We get timeouts most when a device is either slow to respond, or is offline.
What works best is bring down the ping timeout, if possible, as well as lowering the SNMP timeout (again, if possible). This way, if something is offline, you aren't wasting as much time waiting for a response.
What works best is bring down the ping timeout, if possible, as well as lowering the SNMP timeout (again, if possible). This way, if something is offline, you aren't wasting as much time waiting for a response.
How many CPU cores do you have?
Per this guide "http://www.cacti.net/downloads/docs/htm ... spine.html" you may adjust the concurrent poller process up to 2 times of the number of CPU cores.
For the device that often timeout, you can check from cacti's log from somewhere under the console tab but I've no idea to view the slow (and not timeout) query.
Per this guide "http://www.cacti.net/downloads/docs/htm ... spine.html" you may adjust the concurrent poller process up to 2 times of the number of CPU cores.
For the device that often timeout, you can check from cacti's log from somewhere under the console tab but I've no idea to view the slow (and not timeout) query.
Re: poller reaching time limit : where to check the causes ?
In this case,Canto wrote:...
Concurrent Processes 5
Max Threads 25
PHP Servers 10
...
total concurrent processes = 5 * (25 + 10) = 175
By default, 'max_connections' of MySQL is 151. (since MySQL 5.1.15)
(You can check this value as follows)
% mysql -u root -p -e 'show variables'
I think you had better to increase this value or decrease poller processes.
Reference: Stuck spine processes never exit and fill up RAM
Thanks for these infos, I will look at them to fine-tune.
By increasing the log level and looking directly in cacti log file, I have spotted an equipment for which Cacti was configured with duplicate queries. As I collect lots of interfaces data from this equipment and with a ~300ms response time on the link, it required to much time to complete before end of poller time. By removing the duplicate queries, I have greatly reduced the time required to poll this equipment and poller time is now normal.
By the way, tuning the values as you both suggest and adjusting the SNMP timeout will definitely help in improving my system.
Thanks for your help.
By increasing the log level and looking directly in cacti log file, I have spotted an equipment for which Cacti was configured with duplicate queries. As I collect lots of interfaces data from this equipment and with a ~300ms response time on the link, it required to much time to complete before end of poller time. By removing the duplicate queries, I have greatly reduced the time required to poll this equipment and poller time is now normal.
By the way, tuning the values as you both suggest and adjusting the SNMP timeout will definitely help in improving my system.
Thanks for your help.
Who is online
Users browsing this forum: No registered users and 0 guests