Picture is worth a thousand words... Has anyone seen Cacti performance degrade over several days, and be fixed with a reboot?
This cacti install runs fine for a few days with runtime of approx 7 seconds. Then the performance degrades almost linearly until runtime is approx 25 seconds. A reboot immediately fixes the issue, then after a few days it repeats.
Cacti 0.8.8b, spine, 2 pollers, 4 threads, 4 PHP script servers, 10 second script server timeout, 17 hosts / 3170 datasources, 1 minute polling, VMWare 5.5 VM host (lightly loaded), CentOS 6.6 VM, php.ini memory_limit=512M,
An interesting sidenote - there is another cacti VM running just fine on that same VM host. In fact, the problem system is a CLONE of the other (although is monitoring different devices).
The hardware is fairly decent - 2xQuad2.67, 72GB RAM, 15k SaS drives w/disk latency < 2ms.
And one final piece of evidence, when the poller runtime gets very bad I can sometimes no longer open new SSH sessions to that server. Existing sessions are ok. Yet RAM is usage is only 500MB of 4GB, and no swap.
May I borrow some knowledge? Any ideas? Logfiles that you'd recommend checking? Nothing in /var/log/ has jumped out at me.
Thanks!
Mark
[SOLVED] Cacti runtime gets steadily worse; reboot fixes
Moderators: Developers, Moderators
[SOLVED] Cacti runtime gets steadily worse; reboot fixes
- Attachments
-
- Screen Shot 2018-03-10 at 12.21.17 AM.png (79.45 KiB) Viewed 733 times
Last edited by hayzey on Sat Mar 10, 2018 3:35 pm, edited 3 times in total.
Re: Cacti runtime gets steadily worse; reboot fixes issue
Have you checked how many open connections there are?
Cacti Developer & Release Manager
The Cacti Group
Director
BV IT Solutions Ltd
+--------------------------------------------------------------------------+
Cacti Resources:
Cacti Website (including releases)
Cacti Issues
Cacti Development Releases
Cacti Development Documentation
The Cacti Group
Director
BV IT Solutions Ltd
+--------------------------------------------------------------------------+
Cacti Resources:
Cacti Website (including releases)
Cacti Issues
Cacti Development Releases
Cacti Development Documentation
Re: Cacti runtime gets steadily worse; reboot fixes issue
For network connections, "netstat" shows:netniV wrote:Have you checked how many open connections there are?
TCP connections = 34 (just HTTP and SSH)
sockets = 148
For MySQL connections, MySql stats show:
Uptime: 47825 Threads: 18 Questions: 6354060 Slow queries: 0 Opens: 1782 Flush tables: 1 Open tables: 63 Queries per second avg: 132.860
mysql> show status like '%onn%';
+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| Aborted_connects | 0 |
| Connections | 21559 |
| Max_used_connections | 26 |
| Ssl_client_connects | 0 |
| Ssl_connect_renegotiates | 0 |
| Ssl_finished_connects | 0 |
| Threads_connected | 11 |
+--------------------------+-------+
7 rows in set (0.00 sec)
Does 11 connected MySQL threads seem excessive in your opinion? Other cacti installs I have access to currently only show 1 thread.
Also, are there other connection types you're referring to?
Thanks for the suggestion!
Re: Cacti runtime gets steadily worse; reboot fixes issue
I should add that cacti.log doesn't show any obvious issues (log level: LOW), and no service check timeouts are occurring.
03/10/2018 02:37:09 AM - SYSTEM STATS: Time:8.2404 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:38:12 AM - SYSTEM STATS: Time:9.9969 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:39:11 AM - SYSTEM STATS: Time:9.5533 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:40:10 AM - SYSTEM STATS: Time:8.7720 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:41:10 AM - SYSTEM STATS: Time:8.9082 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:42:10 AM - SYSTEM STATS: Time:9.1446 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:43:10 AM - SYSTEM STATS: Time:9.4614 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:44:12 AM - SYSTEM STATS: Time:9.9143 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:45:10 AM - SYSTEM STATS: Time:9.1248 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:46:11 AM - SYSTEM STATS: Time:9.0242 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:47:11 AM - SYSTEM STATS: Time:10.0440 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:48:11 AM - SYSTEM STATS: Time:9.3316 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:49:12 AM - SYSTEM STATS: Time:11.3500 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:50:11 AM - SYSTEM STATS: Time:9.2593 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:51:11 AM - SYSTEM STATS: Time:9.5111 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:52:13 AM - SYSTEM STATS: Time:11.2407 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:53:10 AM - SYSTEM STATS: Time:8.9430 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:54:11 AM - SYSTEM STATS: Time:9.7517 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:55:11 AM - SYSTEM STATS: Time:9.8771 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:56:12 AM - SYSTEM STATS: Time:10.2138 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:57:10 AM - SYSTEM STATS: Time:9.2466 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:58:12 AM - SYSTEM STATS: Time:11.8192 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:59:11 AM - SYSTEM STATS: Time:9.8942 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 03:00:11 AM - SYSTEM STATS: Time:10.1934 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 03:01:14 AM - SYSTEM STATS: Time:12.7712 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 03:02:13 AM - SYSTEM STATS: Time:10.9460 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 03:03:12 AM - SYSTEM STATS: Time:10.8047 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 03:04:14 AM - SYSTEM STATS: Time:12.5198 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 03:05:12 AM - SYSTEM STATS: Time:10.7918 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 03:06:12 AM - SYSTEM STATS: Time:11.1644 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 03:07:12 AM - SYSTEM STATS: Time:11.5918 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 03:08:14 AM - SYSTEM STATS: Time:12.5665 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 03:09:13 AM - SYSTEM STATS: Time:11.2593 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 03:10:12 AM - SYSTEM STATS: Time:10.7017 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 03:11:14 AM - SYSTEM STATS: Time:12.3907 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 03:12:15 AM - SYSTEM STATS: Time:13.9286 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:37:09 AM - SYSTEM STATS: Time:8.2404 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:38:12 AM - SYSTEM STATS: Time:9.9969 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:39:11 AM - SYSTEM STATS: Time:9.5533 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:40:10 AM - SYSTEM STATS: Time:8.7720 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:41:10 AM - SYSTEM STATS: Time:8.9082 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:42:10 AM - SYSTEM STATS: Time:9.1446 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:43:10 AM - SYSTEM STATS: Time:9.4614 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:44:12 AM - SYSTEM STATS: Time:9.9143 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:45:10 AM - SYSTEM STATS: Time:9.1248 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:46:11 AM - SYSTEM STATS: Time:9.0242 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:47:11 AM - SYSTEM STATS: Time:10.0440 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:48:11 AM - SYSTEM STATS: Time:9.3316 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:49:12 AM - SYSTEM STATS: Time:11.3500 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:50:11 AM - SYSTEM STATS: Time:9.2593 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:51:11 AM - SYSTEM STATS: Time:9.5111 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:52:13 AM - SYSTEM STATS: Time:11.2407 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:53:10 AM - SYSTEM STATS: Time:8.9430 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:54:11 AM - SYSTEM STATS: Time:9.7517 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:55:11 AM - SYSTEM STATS: Time:9.8771 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:56:12 AM - SYSTEM STATS: Time:10.2138 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:57:10 AM - SYSTEM STATS: Time:9.2466 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:58:12 AM - SYSTEM STATS: Time:11.8192 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 02:59:11 AM - SYSTEM STATS: Time:9.8942 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 03:00:11 AM - SYSTEM STATS: Time:10.1934 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 03:01:14 AM - SYSTEM STATS: Time:12.7712 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 03:02:13 AM - SYSTEM STATS: Time:10.9460 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 03:03:12 AM - SYSTEM STATS: Time:10.8047 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 03:04:14 AM - SYSTEM STATS: Time:12.5198 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 03:05:12 AM - SYSTEM STATS: Time:10.7918 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 03:06:12 AM - SYSTEM STATS: Time:11.1644 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 03:07:12 AM - SYSTEM STATS: Time:11.5918 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 03:08:14 AM - SYSTEM STATS: Time:12.5665 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 03:09:13 AM - SYSTEM STATS: Time:11.2593 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 03:10:12 AM - SYSTEM STATS: Time:10.7017 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 03:11:14 AM - SYSTEM STATS: Time:12.3907 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
03/10/2018 03:12:15 AM - SYSTEM STATS: Time:13.9286 Method:spine Processes:2 Threads:4 Hosts:17 HostsPerProcess:9 DataSources:3174 RRDsProcessed:886
Re: Cacti runtime gets steadily worse; reboot fixes issue
This cacti system is delaying the sending of snmp requests during busy intervals. Even manual snmpwalks are hanging for up to 15 seconds before the first request is sent to the remote device, which then responds quickly.
I tuned "net.core.somaxconn" from 128 to 1024 in /etc/sysctl.conf, then rebooted. Unfortunately did not help.
Anyone seen issues with snmp delays?
I tuned "net.core.somaxconn" from 128 to 1024 in /etc/sysctl.conf, then rebooted. Unfortunately did not help.
Anyone seen issues with snmp delays?
Re: Cacti runtime gets steadily worse; reboot fixes issue
the fact that you are seeing delays suggests to me abourt either too many open connections or something is throttling your system. Are your connections going through a firewall?
Cacti Developer & Release Manager
The Cacti Group
Director
BV IT Solutions Ltd
+--------------------------------------------------------------------------+
Cacti Resources:
Cacti Website (including releases)
Cacti Issues
Cacti Development Releases
Cacti Development Documentation
The Cacti Group
Director
BV IT Solutions Ltd
+--------------------------------------------------------------------------+
Cacti Resources:
Cacti Website (including releases)
Cacti Issues
Cacti Development Releases
Cacti Development Documentation
Re: Cacti runtime gets steadily worse; reboot fixes issue
Thx for the suggestion, yes this box is directly behind a firewall. I wasn't able to identify the bottleneck, so I threw more hardware at the problem. Specifically, allocated additional processors and RAM to this VM. The cacti runtime has been a stable 7 seconds since.netniV wrote:the fact that you are seeing delays suggests to me abourt either too many open connections or something is throttling your system. Are your connections going through a firewall?
Strange, because the previous output of "top" (on a 1 second update interval) always showed available "idle" time, and no "io wait" time. But there was obviously some OS-level bottleneck that was beyond my tshooting ability to find.
Thanks!
Mark
Who is online
Users browsing this forum: No registered users and 0 guests