High CPU after 0.8.7g upgrade

Post support questions that directly relate to Linux/Unix operating systems.

Moderators: Developers, Moderators

Post Reply
blackbear219
Posts: 6
Joined: Fri Feb 11, 2011 12:15 pm

High CPU after 0.8.7g upgrade

Post by blackbear219 »

Hi all,

We were running the CactiEZ v0.6 which is Cacti 0.8.7c. I upgraded us to 0.8.7g (via http://www.spoonapedia.com/2010/07/upgrading-cacti.html). After following that procedure, I then went and also applied all the 0.8.7g patches. I am running this on a VM with 2 cpus and 4gig of ram (both reserved)

Ever since going to 0.8.7g, the CPU load is high. On 0.8.7c we'd have a load < 2. After initially installing 0.8.7g, the load was up around 15 (wow) but after some tweaking I now have it down to around 6. Better, but still too high.

Any ideas how I can get this thing back under control? Been banging my head against the wall.

Poller interval: Every minute
Cron interval: Every minute
4 poller processes
20 threads
2 script servers
100 max OIDs per request

Code: Select all

02/11/2011 12:09:57 PM - SYSTEM STATS: Time:53.7277 Method:spine Processes:4 Threads:15 Hosts:80 HostsPerProcess:20 DataSources:4053 RRDsProcessed:2055
02/11/2011 12:10:32 PM - SYSTEM STATS: Time:29.8833 Method:spine Processes:4 Threads:15 Hosts:80 HostsPerProcess:20 DataSources:4053 RRDsProcessed:2055
02/11/2011 12:11:27 PM - SYSTEM STATS: Time:25.5916 Method:spine Processes:4 Threads:20 Hosts:80 HostsPerProcess:20 DataSources:4054 RRDsProcessed:2056
02/11/2011 12:12:43 PM - SYSTEM STATS: Time:39.6624 Method:spine Processes:4 Threads:20 Hosts:80 HostsPerProcess:20 DataSources:4051 RRDsProcessed:2053
02/11/2011 12:13:23 PM - SYSTEM STATS: Time:22.2578 Method:spine Processes:4 Threads:20 Hosts:80 HostsPerProcess:20 DataSources:4052 RRDsProcessed:2054
02/11/2011 12:15:50 PM - POLLER: Poller[0] WARNING: Cron is out of sync with the Poller Interval!  The Poller Interval is '60' seconds, with a maximum of a '300' second Cron, but 169 seconds have passed since the last poll!
02/11/2011 12:16:12 PM - SYSTEM STATS: Time:22.6099 Method:spine Processes:4 Threads:20 Hosts:80 HostsPerProcess:20 DataSources:4053 RRDsProcessed:2055
02/11/2011 12:17:02 PM - POLLER: Poller[0] WARNING: Cron is out of sync with the Poller Interval!  The Poller Interval is '60' seconds, with a maximum of a '300' second Cron, but 72 seconds have passed since the last poll!
02/11/2011 12:18:21 PM - SPINE: Poller[0] ERROR: Spine Timed Out While Processing Hosts Internal
02/11/2011 12:18:21 PM - SPINE: Poller[0] ERROR: Spine Timed Out While Processing Hosts Internal
02/11/2011 12:18:21 PM - SPINE: Poller[0] ERROR: Spine Timed Out While Processing Hosts Internal
02/11/2011 12:18:22 PM - POLLER: Poller[0] Maximum runtime of 58 seconds exceeded. Exiting.
02/11/2011 12:18:22 PM - SYSTEM STATS: Time:80.1919 Method:spine Processes:4 Threads:20 Hosts:80 HostsPerProcess:20 DataSources:4053 RRDsProcessed:1302
02/11/2011 12:18:23 PM - POLLER: Poller[0] WARNING: Cron is out of sync with the Poller Interval!  The Poller Interval is '60' seconds, with a maximum of a '300' second Cron, but 81 seconds have passed since the last poll!
02/11/2011 12:18:23 PM - POLLER: Poller[0] WARNING: Poller Output Table not Empty.  Issues Found: 253, Data Sources: traffic_in(DS[464]), traffic_out(DS[464]), traffic_in(DS[465]), traffic_out(DS[465]), traffic_in(DS[466]), traffic_out(DS[466]), traffic_in(DS[467]), traffic_out(DS[467]), traffic_in(DS[468]), traffic_out(DS[468]), traffic_in(DS[469]), traffic_out(DS[469]), traffic_in(DS[470]), traffic_out(DS[470]), traffic_in(DS[471]), traffic_out(DS[471]), traffic_in(DS[472]), traffic_out(DS[472]), traffic_in(DS[473]), traffic_out(DS[473]), traffic_in(DS[474]), Additional Issues Remain.  Only showing first 20
02/11/2011 12:18:44 PM - SYSTEM STATS: Time:21.8209 Method:spine Processes:4 Threads:20 Hosts:80 HostsPerProcess:20 DataSources:4058 RRDsProcessed:2060
Thanks!
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: High CPU after 0.8.7g upgrade

Post by gandalf »

Why did you use an upgrade procedure outside of the official documentation? Is anything missing in our docs?
Did you upgrade spine as well?
R.
blackbear219
Posts: 6
Joined: Fri Feb 11, 2011 12:15 pm

Re: High CPU after 0.8.7g upgrade

Post by blackbear219 »

I chose to follow that particular write-up but it is the same procedure as your official documentation. Yes, I did update Spine as well.

Spine runs in anywhere between 10 to 70 seconds with a CPU load of about 6 right now and that is with plugins disabled. I can't seem to identify exactly what is causing so much more CPU load in 0.8.7g.

Code: Select all

[root@localhost html]# uptime
 13:27:05 up 19:56,  3 users,  load average: 7.37, 6.05, 5.67
[root@localhost html]# grep spine log/cacti.log
02/11/2011 12:59:02 PM - SYSTEM STATS: Time:10.8368 Method:spine Processes:4 Threads:20 Hosts:80 HostsPerProcess:20 DataSources:4061 RRDsProcessed:2063
02/11/2011 01:01:11 PM - SYSTEM STATS: Time:68.8859 Method:spine Processes:4 Threads:20 Hosts:80 HostsPerProcess:20 DataSources:4046 RRDsProcessed:2048
02/11/2011 01:01:48 PM - SYSTEM STATS: Time:44.2979 Method:spine Processes:4 Threads:20 Hosts:80 HostsPerProcess:20 DataSources:4052 RRDsProcessed:2054
02/11/2011 01:02:20 PM - SYSTEM STATS: Time:15.8567 Method:spine Processes:4 Threads:20 Hosts:80 HostsPerProcess:20 DataSources:4052 RRDsProcessed:2054
02/11/2011 01:04:30 PM - SYSTEM STATS: Time:24.8754 Method:spine Processes:4 Threads:20 Hosts:80 HostsPerProcess:20 DataSources:4052 RRDsProcessed:2054
02/11/2011 01:05:43 PM - SYSTEM STATS: Time:41.3234 Method:spine Processes:4 Threads:20 Hosts:80 HostsPerProcess:20 DataSources:4061 RRDsProcessed:2063
02/11/2011 01:07:45 PM - SYSTEM STATS: Time:41.8475 Method:spine Processes:4 Threads:20 Hosts:80 HostsPerProcess:20 DataSources:4046 RRDsProcessed:2048
02/11/2011 01:09:00 PM - SYSTEM STATS: Time:56.9458 Method:spine Processes:4 Threads:20 Hosts:80 HostsPerProcess:20 DataSources:4052 RRDsProcessed:2054
02/11/2011 01:09:31 PM - SYSTEM STATS: Time:28.2717 Method:spine Processes:4 Threads:20 Hosts:80 HostsPerProcess:20 DataSources:4052 RRDsProcessed:2054
02/11/2011 01:11:17 PM - SYSTEM STATS: Time:25.4090 Method:spine Processes:4 Threads:20 Hosts:80 HostsPerProcess:20 DataSources:4052 RRDsProcessed:2054
02/11/2011 01:12:17 PM - SYSTEM STATS: Time:15.7336 Method:spine Processes:4 Threads:20 Hosts:80 HostsPerProcess:20 DataSources:4061 RRDsProcessed:2063
02/11/2011 01:14:29 PM - SYSTEM STATS: Time:23.3770 Method:spine Processes:4 Threads:20 Hosts:80 HostsPerProcess:20 DataSources:4046 RRDsProcessed:2048
02/11/2011 01:15:41 PM - SYSTEM STATS: Time:39.5356 Method:spine Processes:4 Threads:20 Hosts:80 HostsPerProcess:20 DataSources:4052 RRDsProcessed:2054
02/11/2011 01:17:38 PM - SYSTEM STATS: Time:30.7001 Method:spine Processes:4 Threads:20 Hosts:80 HostsPerProcess:20 DataSources:4052 RRDsProcessed:2054
blackbear219
Posts: 6
Joined: Fri Feb 11, 2011 12:15 pm

Re: High CPU after 0.8.7g upgrade

Post by blackbear219 »

In another effort to lighten the load, I've also tried to go to 5 minute intervals on the poller and cron as opposed to my current settings of 1 minute. I am monitoring mostly Cisco routers/switches. When I make this change and reload the cache, the cpu/memory/temp/etc graphs keep graphing, but the interface/traffic graphs stop graphing. I change it back to 1 minute and they start graphing again, I can't seem to figure out why this is either.

Sorry to be a bother, I appreciate your time and advice.
blackbear219
Posts: 6
Joined: Fri Feb 11, 2011 12:15 pm

Re: High CPU after 0.8.7g upgrade

Post by blackbear219 »

Interestingly enough when my attempt to go to 5 minute polling failed as explained above, I set it back to 1 minute intervals and all of a sudden things are running splendidly.

I've started turning our plugins back on one-by-one and monitoring performance. Weathermap and thold are up now, spine is still finishing its run in 10 seconds or less and the cpu load is normal. Things are looking pretty good right now.

Code: Select all

[root@localhost html]# uptime
 16:34:54 up 22:30,  3 users,  load average: 1.12, 1.50, 1.63
[root@localhost html]# grep spine log/cacti.log
02/11/2011 04:19:12 PM - SYSTEM STATS: Time:10.5442 Method:spine Processes:4 Threads:20 Hosts:80 HostsPerProcess:20 DataSources:4002 RRDsProcessed:2004
02/11/2011 04:20:09 PM - SYSTEM STATS: Time:7.3843 Method:spine Processes:4 Threads:20 Hosts:80 HostsPerProcess:20 DataSources:4255 RRDsProcessed:2257
02/11/2011 04:21:09 PM - SYSTEM STATS: Time:8.0176 Method:spine Processes:4 Threads:20 Hosts:80 HostsPerProcess:20 DataSources:4002 RRDsProcessed:2004
02/11/2011 04:22:08 PM - SYSTEM STATS: Time:6.6215 Method:spine Processes:4 Threads:20 Hosts:80 HostsPerProcess:20 DataSources:4002 RRDsProcessed:2004
02/11/2011 04:23:08 PM - SYSTEM STATS: Time:7.1416 Method:spine Processes:4 Threads:20 Hosts:80 HostsPerProcess:20 DataSources:4002 RRDsProcessed:2004
02/11/2011 04:24:06 PM - SYSTEM STATS: Time:5.0333 Method:spine Processes:4 Threads:20 Hosts:80 HostsPerProcess:20 DataSources:4002 RRDsProcessed:2004
02/11/2011 04:25:08 PM - SYSTEM STATS: Time:4.8943 Method:spine Processes:4 Threads:20 Hosts:80 HostsPerProcess:20 DataSources:4255 RRDsProcessed:2257
02/11/2011 04:26:08 PM - SYSTEM STATS: Time:7.1947 Method:spine Processes:4 Threads:20 Hosts:80 HostsPerProcess:20 DataSources:4002 RRDsProcessed:2004
02/11/2011 04:27:11 PM - SYSTEM STATS: Time:9.0058 Method:spine Processes:4 Threads:20 Hosts:80 HostsPerProcess:20 DataSources:4002 RRDsProcessed:2004
02/11/2011 04:28:08 PM - SYSTEM STATS: Time:6.6181 Method:spine Processes:4 Threads:20 Hosts:80 HostsPerProcess:20 DataSources:4002 RRDsProcessed:2004
02/11/2011 04:29:08 PM - SYSTEM STATS: Time:6.2243 Method:spine Processes:4 Threads:20 Hosts:80 HostsPerProcess:20 DataSources:4002 RRDsProcessed:2004
02/11/2011 04:30:09 PM - SYSTEM STATS: Time:7.5430 Method:spine Processes:4 Threads:20 Hosts:80 HostsPerProcess:20 DataSources:4255 RRDsProcessed:2257
02/11/2011 04:31:05 PM - SYSTEM STATS: Time:4.1482 Method:spine Processes:4 Threads:20 Hosts:80 HostsPerProcess:20 DataSources:4002 RRDsProcessed:2004
02/11/2011 04:32:04 PM - SYSTEM STATS: Time:3.3966 Method:spine Processes:4 Threads:20 Hosts:80 HostsPerProcess:20 DataSources:4002 RRDsProcessed:2004
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: High CPU after 0.8.7g upgrade

Post by gandalf »

blackbear219 wrote:In another effort to lighten the load, I've also tried to go to 5 minute intervals on the poller and cron as opposed to my current settings of 1 minute. I am monitoring mostly Cisco routers/switches. When I make this change and reload the cache, the cpu/memory/temp/etc graphs keep graphing, but the interface/traffic graphs stop graphing. I change it back to 1 minute and they start graphing again, I can't seem to figure out why this is either.

Sorry to be a bother, I appreciate your time and advice.
You CAN'T switch poller intervals that way! RRD files are strictly bound to a SPECIFIC polling interval. And unfortunately, rrdtool does not support switching between them. So all in all, this step does not help
R.
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: High CPU after 0.8.7g upgrade

Post by gandalf »

blackbear219 wrote:Interestingly enough when my attempt to go to 5 minute polling failed as explained above, I set it back to 1 minute intervals and all of a sudden things are running splendidly.

I've started turning our plugins back on one-by-one and monitoring performance. Weathermap and thold are up now, spine is still finishing its run in 10 seconds or less and the cpu load is normal. Things are looking pretty good right now.
Bad for us, good for you. But don't ask me, why things are fine, now. I badly suspect, that your rrd files may not match your intentions (from your previous post) and you're running into some issue sooner or later. Please bear in mind that the default rra settings DO NOT MATCH 1 min polling. You'll have to change them and along with this have to delete all existing rrd files loosing all your data.
R.
blackbear219
Posts: 6
Joined: Fri Feb 11, 2011 12:15 pm

Re: High CPU after 0.8.7g upgrade

Post by blackbear219 »

Hmmm ok, interesting. Well, with all that being said, is there documentation anywhere that tells you what exactly needs to be done to change the polling interval? I looked around but could not find anything so just went with my approach, which was apparently incomplete. The data is not much of an issue, we are just in the process of setting this system up and it is not "live" yet so if I have to delete my RRDs that is fine.
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: High CPU after 0.8.7g upgrade

Post by gandalf »

This is not "officially" supported, I would call it "broken". But for those who know, http://forums.cacti.net/viewtopic.php?f=6&t=23885 may help
R.
Post Reply

Who is online

Users browsing this forum: No registered users and 2 guests