wierd polling time creep

Post general support questions here that do not specifically fall into the Linux or Windows categories.

Moderators: Developers, Moderators

MagicOneXXX
Cacti User
Posts: 59
Joined: Tue Dec 19, 2006 4:35 pm

wierd polling time creep

Post by MagicOneXXX »

lately I've been having a weird problem with cacti. It runs stable until I add another host and start graphing it. Once I do, my system stats slowly creep up from ~17 seconds into the 100's. Doing a rebuild of the poller cache resets my time to ~17 seconds, but it continues to creep up again.

Here is an example of the 'poller creep':

04/12/2007 04:54:23 PM - SYSTEM STATS: Time:17.8759 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1818 RRDsProcessed:934
04/12/2007 04:55:27 PM - SYSTEM STATS: Time:22.1139 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1814 RRDsProcessed:932
04/12/2007 04:56:29 PM - SYSTEM STATS: Time:24.1165 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1815 RRDsProcessed:934
04/12/2007 04:57:28 PM - SYSTEM STATS: Time:22.0784 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1817 RRDsProcessed:934
04/12/2007 04:58:28 PM - SYSTEM STATS: Time:23.0596 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1815 RRDsProcessed:933
04/12/2007 04:59:26 PM - SYSTEM STATS: Time:21.4659 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1818 RRDsProcessed:934
04/12/2007 05:00:30 PM - SYSTEM STATS: Time:25.5943 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1832 RRDsProcessed:941
04/12/2007 05:01:35 PM - SYSTEM STATS: Time:30.5847 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1833 RRDsProcessed:943
04/12/2007 05:02:32 PM - SYSTEM STATS: Time:27.8288 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1835 RRDsProcessed:943
04/12/2007 05:03:33 PM - SYSTEM STATS: Time:29.0359 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1833 RRDsProcessed:942
04/12/2007 05:04:34 PM - SYSTEM STATS: Time:31.3829 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1836 RRDsProcessed:943
04/12/2007 05:05:35 PM - SYSTEM STATS: Time:30.5828 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1832 RRDsProcessed:941
04/12/2007 05:06:36 PM - SYSTEM STATS: Time:30.6497 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1833 RRDsProcessed:943
04/12/2007 05:07:43 PM - SYSTEM STATS: Time:39.5587 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1835 RRDsProcessed:943
04/12/2007 05:08:45 PM - SYSTEM STATS: Time:40.2732 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1833 RRDsProcessed:942

Etc. on up. This occurs with different hosts, so it's not the host i'm trying to graph.

Any thoughts?
User avatar
BSOD2600
Cacti Moderator
Posts: 12171
Joined: Sat May 08, 2004 12:44 pm
Location: USA

Post by BSOD2600 »

I gather you have the 1 min polling patch installed? Any plugins too?
I also see the number of datasources is changing with the polling times -- are devices going up/down or you polling different ones at different time intervals?
MagicOneXXX
Cacti User
Posts: 59
Joined: Tue Dec 19, 2006 4:35 pm

System info

Post by MagicOneXXX »

Here is my system info:
Cacti Version - 0.8.6i
Plugin Architecture - 1.0
Poller Type - Cactid v0.8.6i
Server Info - Linux 2.6.16.21-0.8-smp
Web Server - Apache/2.2.0 (Linux/SUSE)
PHP - 5.1.2
PHP Extensions - libxml, xml, standard, SimpleXML, SPL, session, Reflection, date, pcre, apache2handler, gd, mysql, mysqli, snmp, sockets
MySQL - 5.0.18
RRDTool - 1.2.12
SNMP - 5.3.0.1
Plugins
  • Thresholds (thold - v0.3.0)
    Device Tracking (mactrack - v0.0.1b)
    Network Discovery (discovery - v0.7)
    Network Tools (tools - v0.2)
    Device Monitoring (monitor - v0.7)
    NTop Viewer (ntop - v0.1)
    PHP Network Weathermap (weathermap - v0.9pre2)
    Report Creator (reports - v0.1b)
    Update Checker (update - v0.3)
    Host Info (hostinfo - v0.1)
Mactrack polling is disabled at the moment.

Hardware info:
Intel Xeon 2.8GHz HT
512MB Ram

For the most part, we're using 1 minute polling for about 95% of our devices. For a few devices, over in another country, we use 5 minute polling. That's why there is a little variance.

We poll quite a few switches, so we're pulling back 24-48 data sources per device. We are also pulling back 2 core switches that each have 5 blades and somewhere around 100 ports.

I thought RAM might be the issue, but i'd expect these sort of results only if the poller went above 60 seconds and polling started to 'stumble' upon itself.

Any thoughts?
MagicOneXXX
Cacti User
Posts: 59
Joined: Tue Dec 19, 2006 4:35 pm

another observation

Post by MagicOneXXX »

I've been monitoring the server before and after I add another device, and discovered that after adding the device, additional cron processes spawn. This seems to cause more php processes to spawn as well.

I've tried this a couple of times - disabling the device and letting the server 'stabilize' again - then re-enabling the device and watching the additonal processes spawn.
MagicOneXXX
Cacti User
Posts: 59
Joined: Tue Dec 19, 2006 4:35 pm

Poller output results

Post by MagicOneXXX »

I've done more digging, and ended up pushing the poller cron results (php -q poller.php) to a log file. here is a snippet of that log that looks interesting:

OK u:0.06 s:0.63 r:23.75
OK u:0.06 s:0.64 r:24.07
OK u:0.06 s:0.64 r:24.30
OK u:0.06 s:0.65 r:24.41
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^$
OK u:0.08 s:1.40 r:45.18
OK u:0.08 s:1.41 r:45.35
OK u:0.08 s:1.41 r:45.53
OK u:0.08 s:1.42 r:45.71
OK u:0.08 s:1.42 r:45.93

There is a huge jump in stats with the odd ^@ stuff between it. I don't think this is normal. The ending stats of this particular run are:

OK u:0.10 s:2.58 r:71.93

I'm guessing the r: is the total runtime?

Also, the system stats seem to be posted mid-run:

OK u:0.06 s:0.32 r:17.50
OK u:0.06 s:0.32 r:17.54
OK u:0.06 s:0.32 r:17.55
04/13/2007 04:50:32 PM - SYSTEM STATS: Time:29.2292 Method:cactid Processes:13 Threads:10 Hosts:54 HostsPerProcess:5 DataSources:1786 RRDsProcessed:918
OK u:0.06 s:0.32 r:17.56
OK u:0.06 s:0.32 r:17.57
OK u:0.06 s:0.32 r:17.58
OK u:0.06 s:0.32 r:17.58

Which seems odd to me. Shouldn't it be posted at the end of the run?

Also, if i open the log file in the beginning of a run, the file starts out with:

OK u:0.08 s:1.55 r:45.34
OK u:0.08 s:1.56 r:45.45
OK u:0.08 s:1.56 r:45.59
OK u:0.08 s:1.56 r:45.71
OK u:0.08 s:1.56 r:45.75
OK u:0.08 s:1.56 r:45.78
OK u:0.08 s:1.56 r:45.93
OK u:0.08 s:1.56 r:45.97
OK u:0.08 s:1.57 r:46.05

and if I open it again mid-run, i get the funky

OK u:0.00 s:0.01 r:2.21
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^$
OK u:0.08 s:1.55 r:45.34

in the log.

Combine this with the fact that there are multiple cron jobs that spawn, i'm guessing there is some double-polling going on.

Thoughts?
MagicOneXXX
Cacti User
Posts: 59
Joined: Tue Dec 19, 2006 4:35 pm

ps aux results

Post by MagicOneXXX »

Here is the results of a ps aux:

root 16948 0.0 0.1 1820 540 ? Ss 17:44 0:00 /usr/sbin/cron
root 16951 0.0 0.1 2420 824 ? S 17:45 0:00 /usr/sbin/cron
wwwrun 16952 0.0 0.2 2524 1104 ? Ss 17:45 0:00 /bin/bash -c /usr/bin/php5 -q /srv/www/htdocs/cacti/poller.php > /var/log/cacti/poller.log 2>&1
wwwrun 16953 1.7 2.3 52812 12068 ? D 17:45 0:01 /usr/bin/php5 -q /srv/www/htdocs/cacti/poller.php
root 17404 0.0 0.1 2420 824 ? S 17:46 0:00 /usr/sbin/cron
wwwrun 17405 0.0 0.2 2524 1100 ? Ss 17:46 0:00 /bin/bash -c /usr/bin/php5 -q /srv/www/htdocs/cacti/poller.php > /var/log/cacti/poller.log 2>&1
wwwrun 17406 6.5 2.4 53340 12436 ? S 17:46 0:01 /usr/bin/php5 -q /srv/www/htdocs/cacti/poller.php

and here is my crontab:

SHELL=/bin/bash
PATH=/usr/bin:/usr/sbin:/sbin:/bin:/usr/lib/news/bin:/usr/local/bin
#
# check scripts in cron.hourly, cron.daily, cron.weekly, and cron.monthly
#
*/1 * * * * wwwrun /usr/bin/php5 -q /srv/www/htdocs/cacti/poller.php > var/log/cacti/poller.log 2>&1


Def. a double poll. Another cron process is spawned right after the first, and both spawn a polling cycle, causing a double poll.

Suggestions?
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Post by gandalf »

Please chack all crontabs
- /etc/crontab
- /etc/cron.d/cacti
- crontabs of users root/cactiuser
Reinhard
MagicOneXXX
Cacti User
Posts: 59
Joined: Tue Dec 19, 2006 4:35 pm

no go

Post by MagicOneXXX »

No go :( I checked all my cron schedulers; I removed everything that was in cron.daily (mostly logrotate stuff) and restarted cron. A cron process still spawns right after it starts running the poller.

See above for what's in my crontab.

Anyone else experience this odd issue?
MagicOneXXX
Cacti User
Posts: 59
Joined: Tue Dec 19, 2006 4:35 pm

over time?

Post by MagicOneXXX »

Ok, so I killed all of the cron jobs and running php processes, waited a bit, then ran 'time /usr/bin/php5 /srv/www/htdocs/cacti/poller.php'. The poller ends up reporting a total time of about 7 seconds, but the process continues on and ends up running a total of 1min 10 seconds. I think this may be causing an overrun, so another cron process spawns before the last one finished.

Why is the poller reporting such low process times, when the total time to poll is really much higher?
User avatar
fmangeant
Cacti Guru User
Posts: 2345
Joined: Fri Sep 19, 2003 8:36 am
Location: Sophia-Antipolis, France
Contact:

Post by fmangeant »

Hi

can you try with 2 processes and 15 theads ?

You can see this thread from Reinhard : http://forums.cacti.net/viewtopic.php?p=97180#97180
[size=84]
[color=green]HOWTOs[/color] :
[list][*][url=http://forums.cacti.net/viewtopic.php?t=15353]Install and configure the Net-SNMP agent for Unix[/url]
[*][url=http://forums.cacti.net/viewtopic.php?t=26151]Install and configure the Net-SNMP agent for Windows[/url]
[*][url=http://forums.cacti.net/viewtopic.php?t=28175]Graph multiple servers using an SNMP proxy[/url][/list]
[color=green]Templates[/color] :
[list][*][url=http://forums.cacti.net/viewtopic.php?t=15412]Multiple CPU usage for Linux[/url]
[*][url=http://forums.cacti.net/viewtopic.php?p=125152]Memory & swap usage for Unix[/url][/list][/size]
MagicOneXXX
Cacti User
Posts: 59
Joined: Tue Dec 19, 2006 4:35 pm

Tried without success

Post by MagicOneXXX »

I set my poller settings to what is recommended, but to no avial. Poller still takes over 1 minute to complete. Poller reports it takes alot less:

OK u:0.06 s:0.29 r:7.65
OK u:0.06 s:0.29 r:7.66
OK u:0.06 s:0.29 r:7.68
OK u:0.06 s:0.29 r:7.71
04/16/2007 10:50:13 AM - SYSTEM STATS: Time:11.8671 Method:cactid Processes:4 Threads:15 Hosts:54 HostsPerProcess:14 DataSources:1784 RRDsProcessed:917
OK u:0.06 s:0.29 r:7.72
OK u:0.06 s:0.29 r:7.73
OK u:0.06 s:0.29 r:7.73
OK u:0.06 s:0.29 r:7.74
OK u:0.06 s:0.29 r:7.75

And I don't even have that many datasources/hosts! :o

Only bottleneck I can think of is RAM (Disks are 15k rpm) which is low at 512mb (but, free is telling me it still has 373MB ram free for programs. Most of that is going to disk caching). RAM is on order to up it to 1.5G, but that will take a few weeks to get in.

This looks alot more like a performance issue now.

Thanks thus far for all your suggestions!
User avatar
TheWitness
Developer
Posts: 17007
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

Make sure you don't have extra poller processes running. When you were running 13 process, I just about died. Keep in mind that if you run script server, you also get X PHP Script Server processes and who know's how many scripts running.

This can overwhelm a system quick. If your poller finishes in 11 seconds (as it should) and you continue to receive "OK" messages after the poller completes, this is RRDtool hanging up on RRDupdates.

Please keep in mind that your RRA folder MUST or should never be NFS mounted. If it is, figure something else out, please.

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
MagicOneXXX
Cacti User
Posts: 59
Joined: Tue Dec 19, 2006 4:35 pm

thanks

Post by MagicOneXXX »

Thanks for the advice, TheWitness! I didn't know what the limits should be on the concurrent processes, as there isn't anything listed for recommendations in the settings page (at least in 0.8.6i).

According to the Disk I/O, the server is writing about 33mb per minute to disk, which it should have no problem doing with 15k rpm SCSI drives. What else could possibly bottleneck rrdtool?

FYI, the drives are reiserfs.
User avatar
TheWitness
Developer
Posts: 17007
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

Here is "how" you test things:

1) Run poller.php outside of CRON. All of your RRDupdates will appear on the screen
2) While running, watch for "cactid" processes while using top or ps -ef | grep cactid.
3) When the cactid processes end, time the time it takes for all the "OK" messages to clear off of the screen.

The time difference between the last cactid process completing and the last "OK" message will be disk/rrdupdate delay.

Let me know what you find. Could be you have excellent disks and no disk cache (aka poor drivers), which will make the best disk look like crap.

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
MagicOneXXX
Cacti User
Posts: 59
Joined: Tue Dec 19, 2006 4:35 pm

Post by MagicOneXXX »

here are my stats:

Cactid time: 11.3245 seconds
Total poller.php time: 1m5s
Post Reply

Who is online

Users browsing this forum: No registered users and 0 guests