wierd polling time creep
Moderators: Developers, Moderators
-
- Cacti User
- Posts: 59
- Joined: Tue Dec 19, 2006 4:35 pm
wierd polling time creep
lately I've been having a weird problem with cacti. It runs stable until I add another host and start graphing it. Once I do, my system stats slowly creep up from ~17 seconds into the 100's. Doing a rebuild of the poller cache resets my time to ~17 seconds, but it continues to creep up again.
Here is an example of the 'poller creep':
04/12/2007 04:54:23 PM - SYSTEM STATS: Time:17.8759 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1818 RRDsProcessed:934
04/12/2007 04:55:27 PM - SYSTEM STATS: Time:22.1139 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1814 RRDsProcessed:932
04/12/2007 04:56:29 PM - SYSTEM STATS: Time:24.1165 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1815 RRDsProcessed:934
04/12/2007 04:57:28 PM - SYSTEM STATS: Time:22.0784 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1817 RRDsProcessed:934
04/12/2007 04:58:28 PM - SYSTEM STATS: Time:23.0596 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1815 RRDsProcessed:933
04/12/2007 04:59:26 PM - SYSTEM STATS: Time:21.4659 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1818 RRDsProcessed:934
04/12/2007 05:00:30 PM - SYSTEM STATS: Time:25.5943 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1832 RRDsProcessed:941
04/12/2007 05:01:35 PM - SYSTEM STATS: Time:30.5847 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1833 RRDsProcessed:943
04/12/2007 05:02:32 PM - SYSTEM STATS: Time:27.8288 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1835 RRDsProcessed:943
04/12/2007 05:03:33 PM - SYSTEM STATS: Time:29.0359 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1833 RRDsProcessed:942
04/12/2007 05:04:34 PM - SYSTEM STATS: Time:31.3829 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1836 RRDsProcessed:943
04/12/2007 05:05:35 PM - SYSTEM STATS: Time:30.5828 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1832 RRDsProcessed:941
04/12/2007 05:06:36 PM - SYSTEM STATS: Time:30.6497 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1833 RRDsProcessed:943
04/12/2007 05:07:43 PM - SYSTEM STATS: Time:39.5587 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1835 RRDsProcessed:943
04/12/2007 05:08:45 PM - SYSTEM STATS: Time:40.2732 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1833 RRDsProcessed:942
Etc. on up. This occurs with different hosts, so it's not the host i'm trying to graph.
Any thoughts?
Here is an example of the 'poller creep':
04/12/2007 04:54:23 PM - SYSTEM STATS: Time:17.8759 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1818 RRDsProcessed:934
04/12/2007 04:55:27 PM - SYSTEM STATS: Time:22.1139 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1814 RRDsProcessed:932
04/12/2007 04:56:29 PM - SYSTEM STATS: Time:24.1165 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1815 RRDsProcessed:934
04/12/2007 04:57:28 PM - SYSTEM STATS: Time:22.0784 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1817 RRDsProcessed:934
04/12/2007 04:58:28 PM - SYSTEM STATS: Time:23.0596 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1815 RRDsProcessed:933
04/12/2007 04:59:26 PM - SYSTEM STATS: Time:21.4659 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1818 RRDsProcessed:934
04/12/2007 05:00:30 PM - SYSTEM STATS: Time:25.5943 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1832 RRDsProcessed:941
04/12/2007 05:01:35 PM - SYSTEM STATS: Time:30.5847 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1833 RRDsProcessed:943
04/12/2007 05:02:32 PM - SYSTEM STATS: Time:27.8288 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1835 RRDsProcessed:943
04/12/2007 05:03:33 PM - SYSTEM STATS: Time:29.0359 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1833 RRDsProcessed:942
04/12/2007 05:04:34 PM - SYSTEM STATS: Time:31.3829 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1836 RRDsProcessed:943
04/12/2007 05:05:35 PM - SYSTEM STATS: Time:30.5828 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1832 RRDsProcessed:941
04/12/2007 05:06:36 PM - SYSTEM STATS: Time:30.6497 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1833 RRDsProcessed:943
04/12/2007 05:07:43 PM - SYSTEM STATS: Time:39.5587 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1835 RRDsProcessed:943
04/12/2007 05:08:45 PM - SYSTEM STATS: Time:40.2732 Method:cactid Processes:13 Threads:10 Hosts:55 HostsPerProcess:5 DataSources:1833 RRDsProcessed:942
Etc. on up. This occurs with different hosts, so it's not the host i'm trying to graph.
Any thoughts?
I gather you have the 1 min polling patch installed? Any plugins too?
I also see the number of datasources is changing with the polling times -- are devices going up/down or you polling different ones at different time intervals?
I also see the number of datasources is changing with the polling times -- are devices going up/down or you polling different ones at different time intervals?
| Scripts: Monitor processes | RFC1213 MIB | DOCSIS Stats | Dell PowerEdge | Speedfan | APC UPS | DOCSIS CMTS | 3ware | Motorola Canopy |
| Guides: Windows Install | [HOWTO] Debug Windows NTFS permission problems |
| Tools: Windows All-in-one Installer |
-
- Cacti User
- Posts: 59
- Joined: Tue Dec 19, 2006 4:35 pm
System info
Here is my system info:
Hardware info:
Intel Xeon 2.8GHz HT
512MB Ram
For the most part, we're using 1 minute polling for about 95% of our devices. For a few devices, over in another country, we use 5 minute polling. That's why there is a little variance.
We poll quite a few switches, so we're pulling back 24-48 data sources per device. We are also pulling back 2 core switches that each have 5 blades and somewhere around 100 ports.
I thought RAM might be the issue, but i'd expect these sort of results only if the poller went above 60 seconds and polling started to 'stumble' upon itself.
Any thoughts?
Mactrack polling is disabled at the moment.Cacti Version - 0.8.6i
Plugin Architecture - 1.0
Poller Type - Cactid v0.8.6i
Server Info - Linux 2.6.16.21-0.8-smp
Web Server - Apache/2.2.0 (Linux/SUSE)
PHP - 5.1.2
PHP Extensions - libxml, xml, standard, SimpleXML, SPL, session, Reflection, date, pcre, apache2handler, gd, mysql, mysqli, snmp, sockets
MySQL - 5.0.18
RRDTool - 1.2.12
SNMP - 5.3.0.1
Plugins
- Thresholds (thold - v0.3.0)
Device Tracking (mactrack - v0.0.1b)
Network Discovery (discovery - v0.7)
Network Tools (tools - v0.2)
Device Monitoring (monitor - v0.7)
NTop Viewer (ntop - v0.1)
PHP Network Weathermap (weathermap - v0.9pre2)
Report Creator (reports - v0.1b)
Update Checker (update - v0.3)
Host Info (hostinfo - v0.1)
Hardware info:
Intel Xeon 2.8GHz HT
512MB Ram
For the most part, we're using 1 minute polling for about 95% of our devices. For a few devices, over in another country, we use 5 minute polling. That's why there is a little variance.
We poll quite a few switches, so we're pulling back 24-48 data sources per device. We are also pulling back 2 core switches that each have 5 blades and somewhere around 100 ports.
I thought RAM might be the issue, but i'd expect these sort of results only if the poller went above 60 seconds and polling started to 'stumble' upon itself.
Any thoughts?
-
- Cacti User
- Posts: 59
- Joined: Tue Dec 19, 2006 4:35 pm
another observation
I've been monitoring the server before and after I add another device, and discovered that after adding the device, additional cron processes spawn. This seems to cause more php processes to spawn as well.
I've tried this a couple of times - disabling the device and letting the server 'stabilize' again - then re-enabling the device and watching the additonal processes spawn.
I've tried this a couple of times - disabling the device and letting the server 'stabilize' again - then re-enabling the device and watching the additonal processes spawn.
-
- Cacti User
- Posts: 59
- Joined: Tue Dec 19, 2006 4:35 pm
Poller output results
I've done more digging, and ended up pushing the poller cron results (php -q poller.php) to a log file. here is a snippet of that log that looks interesting:
OK u:0.06 s:0.63 r:23.75
OK u:0.06 s:0.64 r:24.07
OK u:0.06 s:0.64 r:24.30
OK u:0.06 s:0.65 r:24.41
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^$
OK u:0.08 s:1.40 r:45.18
OK u:0.08 s:1.41 r:45.35
OK u:0.08 s:1.41 r:45.53
OK u:0.08 s:1.42 r:45.71
OK u:0.08 s:1.42 r:45.93
There is a huge jump in stats with the odd ^@ stuff between it. I don't think this is normal. The ending stats of this particular run are:
OK u:0.10 s:2.58 r:71.93
I'm guessing the r: is the total runtime?
Also, the system stats seem to be posted mid-run:
OK u:0.06 s:0.32 r:17.50
OK u:0.06 s:0.32 r:17.54
OK u:0.06 s:0.32 r:17.55
04/13/2007 04:50:32 PM - SYSTEM STATS: Time:29.2292 Method:cactid Processes:13 Threads:10 Hosts:54 HostsPerProcess:5 DataSources:1786 RRDsProcessed:918
OK u:0.06 s:0.32 r:17.56
OK u:0.06 s:0.32 r:17.57
OK u:0.06 s:0.32 r:17.58
OK u:0.06 s:0.32 r:17.58
Which seems odd to me. Shouldn't it be posted at the end of the run?
Also, if i open the log file in the beginning of a run, the file starts out with:
OK u:0.08 s:1.55 r:45.34
OK u:0.08 s:1.56 r:45.45
OK u:0.08 s:1.56 r:45.59
OK u:0.08 s:1.56 r:45.71
OK u:0.08 s:1.56 r:45.75
OK u:0.08 s:1.56 r:45.78
OK u:0.08 s:1.56 r:45.93
OK u:0.08 s:1.56 r:45.97
OK u:0.08 s:1.57 r:46.05
and if I open it again mid-run, i get the funky
OK u:0.00 s:0.01 r:2.21
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^$
OK u:0.08 s:1.55 r:45.34
in the log.
Combine this with the fact that there are multiple cron jobs that spawn, i'm guessing there is some double-polling going on.
Thoughts?
OK u:0.06 s:0.63 r:23.75
OK u:0.06 s:0.64 r:24.07
OK u:0.06 s:0.64 r:24.30
OK u:0.06 s:0.65 r:24.41
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^$
OK u:0.08 s:1.40 r:45.18
OK u:0.08 s:1.41 r:45.35
OK u:0.08 s:1.41 r:45.53
OK u:0.08 s:1.42 r:45.71
OK u:0.08 s:1.42 r:45.93
There is a huge jump in stats with the odd ^@ stuff between it. I don't think this is normal. The ending stats of this particular run are:
OK u:0.10 s:2.58 r:71.93
I'm guessing the r: is the total runtime?
Also, the system stats seem to be posted mid-run:
OK u:0.06 s:0.32 r:17.50
OK u:0.06 s:0.32 r:17.54
OK u:0.06 s:0.32 r:17.55
04/13/2007 04:50:32 PM - SYSTEM STATS: Time:29.2292 Method:cactid Processes:13 Threads:10 Hosts:54 HostsPerProcess:5 DataSources:1786 RRDsProcessed:918
OK u:0.06 s:0.32 r:17.56
OK u:0.06 s:0.32 r:17.57
OK u:0.06 s:0.32 r:17.58
OK u:0.06 s:0.32 r:17.58
Which seems odd to me. Shouldn't it be posted at the end of the run?
Also, if i open the log file in the beginning of a run, the file starts out with:
OK u:0.08 s:1.55 r:45.34
OK u:0.08 s:1.56 r:45.45
OK u:0.08 s:1.56 r:45.59
OK u:0.08 s:1.56 r:45.71
OK u:0.08 s:1.56 r:45.75
OK u:0.08 s:1.56 r:45.78
OK u:0.08 s:1.56 r:45.93
OK u:0.08 s:1.56 r:45.97
OK u:0.08 s:1.57 r:46.05
and if I open it again mid-run, i get the funky
OK u:0.00 s:0.01 r:2.21
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^$
OK u:0.08 s:1.55 r:45.34
in the log.
Combine this with the fact that there are multiple cron jobs that spawn, i'm guessing there is some double-polling going on.
Thoughts?
-
- Cacti User
- Posts: 59
- Joined: Tue Dec 19, 2006 4:35 pm
ps aux results
Here is the results of a ps aux:
root 16948 0.0 0.1 1820 540 ? Ss 17:44 0:00 /usr/sbin/cron
root 16951 0.0 0.1 2420 824 ? S 17:45 0:00 /usr/sbin/cron
wwwrun 16952 0.0 0.2 2524 1104 ? Ss 17:45 0:00 /bin/bash -c /usr/bin/php5 -q /srv/www/htdocs/cacti/poller.php > /var/log/cacti/poller.log 2>&1
wwwrun 16953 1.7 2.3 52812 12068 ? D 17:45 0:01 /usr/bin/php5 -q /srv/www/htdocs/cacti/poller.php
root 17404 0.0 0.1 2420 824 ? S 17:46 0:00 /usr/sbin/cron
wwwrun 17405 0.0 0.2 2524 1100 ? Ss 17:46 0:00 /bin/bash -c /usr/bin/php5 -q /srv/www/htdocs/cacti/poller.php > /var/log/cacti/poller.log 2>&1
wwwrun 17406 6.5 2.4 53340 12436 ? S 17:46 0:01 /usr/bin/php5 -q /srv/www/htdocs/cacti/poller.php
and here is my crontab:
SHELL=/bin/bash
PATH=/usr/bin:/usr/sbin:/sbin:/bin:/usr/lib/news/bin:/usr/local/bin
#
# check scripts in cron.hourly, cron.daily, cron.weekly, and cron.monthly
#
*/1 * * * * wwwrun /usr/bin/php5 -q /srv/www/htdocs/cacti/poller.php > var/log/cacti/poller.log 2>&1
Def. a double poll. Another cron process is spawned right after the first, and both spawn a polling cycle, causing a double poll.
Suggestions?
root 16948 0.0 0.1 1820 540 ? Ss 17:44 0:00 /usr/sbin/cron
root 16951 0.0 0.1 2420 824 ? S 17:45 0:00 /usr/sbin/cron
wwwrun 16952 0.0 0.2 2524 1104 ? Ss 17:45 0:00 /bin/bash -c /usr/bin/php5 -q /srv/www/htdocs/cacti/poller.php > /var/log/cacti/poller.log 2>&1
wwwrun 16953 1.7 2.3 52812 12068 ? D 17:45 0:01 /usr/bin/php5 -q /srv/www/htdocs/cacti/poller.php
root 17404 0.0 0.1 2420 824 ? S 17:46 0:00 /usr/sbin/cron
wwwrun 17405 0.0 0.2 2524 1100 ? Ss 17:46 0:00 /bin/bash -c /usr/bin/php5 -q /srv/www/htdocs/cacti/poller.php > /var/log/cacti/poller.log 2>&1
wwwrun 17406 6.5 2.4 53340 12436 ? S 17:46 0:01 /usr/bin/php5 -q /srv/www/htdocs/cacti/poller.php
and here is my crontab:
SHELL=/bin/bash
PATH=/usr/bin:/usr/sbin:/sbin:/bin:/usr/lib/news/bin:/usr/local/bin
#
# check scripts in cron.hourly, cron.daily, cron.weekly, and cron.monthly
#
*/1 * * * * wwwrun /usr/bin/php5 -q /srv/www/htdocs/cacti/poller.php > var/log/cacti/poller.log 2>&1
Def. a double poll. Another cron process is spawned right after the first, and both spawn a polling cycle, causing a double poll.
Suggestions?
-
- Cacti User
- Posts: 59
- Joined: Tue Dec 19, 2006 4:35 pm
no go
No go I checked all my cron schedulers; I removed everything that was in cron.daily (mostly logrotate stuff) and restarted cron. A cron process still spawns right after it starts running the poller.
See above for what's in my crontab.
Anyone else experience this odd issue?
See above for what's in my crontab.
Anyone else experience this odd issue?
-
- Cacti User
- Posts: 59
- Joined: Tue Dec 19, 2006 4:35 pm
over time?
Ok, so I killed all of the cron jobs and running php processes, waited a bit, then ran 'time /usr/bin/php5 /srv/www/htdocs/cacti/poller.php'. The poller ends up reporting a total time of about 7 seconds, but the process continues on and ends up running a total of 1min 10 seconds. I think this may be causing an overrun, so another cron process spawns before the last one finished.
Why is the poller reporting such low process times, when the total time to poll is really much higher?
Why is the poller reporting such low process times, when the total time to poll is really much higher?
- fmangeant
- Cacti Guru User
- Posts: 2345
- Joined: Fri Sep 19, 2003 8:36 am
- Location: Sophia-Antipolis, France
- Contact:
Hi
can you try with 2 processes and 15 theads ?
You can see this thread from Reinhard : http://forums.cacti.net/viewtopic.php?p=97180#97180
can you try with 2 processes and 15 theads ?
You can see this thread from Reinhard : http://forums.cacti.net/viewtopic.php?p=97180#97180
[size=84]
[color=green]HOWTOs[/color] :
[list][*][url=http://forums.cacti.net/viewtopic.php?t=15353]Install and configure the Net-SNMP agent for Unix[/url]
[*][url=http://forums.cacti.net/viewtopic.php?t=26151]Install and configure the Net-SNMP agent for Windows[/url]
[*][url=http://forums.cacti.net/viewtopic.php?t=28175]Graph multiple servers using an SNMP proxy[/url][/list]
[color=green]Templates[/color] :
[list][*][url=http://forums.cacti.net/viewtopic.php?t=15412]Multiple CPU usage for Linux[/url]
[*][url=http://forums.cacti.net/viewtopic.php?p=125152]Memory & swap usage for Unix[/url][/list][/size]
[color=green]HOWTOs[/color] :
[list][*][url=http://forums.cacti.net/viewtopic.php?t=15353]Install and configure the Net-SNMP agent for Unix[/url]
[*][url=http://forums.cacti.net/viewtopic.php?t=26151]Install and configure the Net-SNMP agent for Windows[/url]
[*][url=http://forums.cacti.net/viewtopic.php?t=28175]Graph multiple servers using an SNMP proxy[/url][/list]
[color=green]Templates[/color] :
[list][*][url=http://forums.cacti.net/viewtopic.php?t=15412]Multiple CPU usage for Linux[/url]
[*][url=http://forums.cacti.net/viewtopic.php?p=125152]Memory & swap usage for Unix[/url][/list][/size]
-
- Cacti User
- Posts: 59
- Joined: Tue Dec 19, 2006 4:35 pm
Tried without success
I set my poller settings to what is recommended, but to no avial. Poller still takes over 1 minute to complete. Poller reports it takes alot less:
OK u:0.06 s:0.29 r:7.65
OK u:0.06 s:0.29 r:7.66
OK u:0.06 s:0.29 r:7.68
OK u:0.06 s:0.29 r:7.71
04/16/2007 10:50:13 AM - SYSTEM STATS: Time:11.8671 Method:cactid Processes:4 Threads:15 Hosts:54 HostsPerProcess:14 DataSources:1784 RRDsProcessed:917
OK u:0.06 s:0.29 r:7.72
OK u:0.06 s:0.29 r:7.73
OK u:0.06 s:0.29 r:7.73
OK u:0.06 s:0.29 r:7.74
OK u:0.06 s:0.29 r:7.75
And I don't even have that many datasources/hosts!
Only bottleneck I can think of is RAM (Disks are 15k rpm) which is low at 512mb (but, free is telling me it still has 373MB ram free for programs. Most of that is going to disk caching). RAM is on order to up it to 1.5G, but that will take a few weeks to get in.
This looks alot more like a performance issue now.
Thanks thus far for all your suggestions!
OK u:0.06 s:0.29 r:7.65
OK u:0.06 s:0.29 r:7.66
OK u:0.06 s:0.29 r:7.68
OK u:0.06 s:0.29 r:7.71
04/16/2007 10:50:13 AM - SYSTEM STATS: Time:11.8671 Method:cactid Processes:4 Threads:15 Hosts:54 HostsPerProcess:14 DataSources:1784 RRDsProcessed:917
OK u:0.06 s:0.29 r:7.72
OK u:0.06 s:0.29 r:7.73
OK u:0.06 s:0.29 r:7.73
OK u:0.06 s:0.29 r:7.74
OK u:0.06 s:0.29 r:7.75
And I don't even have that many datasources/hosts!
Only bottleneck I can think of is RAM (Disks are 15k rpm) which is low at 512mb (but, free is telling me it still has 373MB ram free for programs. Most of that is going to disk caching). RAM is on order to up it to 1.5G, but that will take a few weeks to get in.
This looks alot more like a performance issue now.
Thanks thus far for all your suggestions!
- TheWitness
- Developer
- Posts: 17007
- Joined: Tue May 14, 2002 5:08 pm
- Location: MI, USA
- Contact:
Make sure you don't have extra poller processes running. When you were running 13 process, I just about died. Keep in mind that if you run script server, you also get X PHP Script Server processes and who know's how many scripts running.
This can overwhelm a system quick. If your poller finishes in 11 seconds (as it should) and you continue to receive "OK" messages after the poller completes, this is RRDtool hanging up on RRDupdates.
Please keep in mind that your RRA folder MUST or should never be NFS mounted. If it is, figure something else out, please.
TheWitness
This can overwhelm a system quick. If your poller finishes in 11 seconds (as it should) and you continue to receive "OK" messages after the poller completes, this is RRDtool hanging up on RRDupdates.
Please keep in mind that your RRA folder MUST or should never be NFS mounted. If it is, figure something else out, please.
TheWitness
True understanding begins only when we realize how little we truly understand...
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
-
- Cacti User
- Posts: 59
- Joined: Tue Dec 19, 2006 4:35 pm
thanks
Thanks for the advice, TheWitness! I didn't know what the limits should be on the concurrent processes, as there isn't anything listed for recommendations in the settings page (at least in 0.8.6i).
According to the Disk I/O, the server is writing about 33mb per minute to disk, which it should have no problem doing with 15k rpm SCSI drives. What else could possibly bottleneck rrdtool?
FYI, the drives are reiserfs.
According to the Disk I/O, the server is writing about 33mb per minute to disk, which it should have no problem doing with 15k rpm SCSI drives. What else could possibly bottleneck rrdtool?
FYI, the drives are reiserfs.
- TheWitness
- Developer
- Posts: 17007
- Joined: Tue May 14, 2002 5:08 pm
- Location: MI, USA
- Contact:
Here is "how" you test things:
1) Run poller.php outside of CRON. All of your RRDupdates will appear on the screen
2) While running, watch for "cactid" processes while using top or ps -ef | grep cactid.
3) When the cactid processes end, time the time it takes for all the "OK" messages to clear off of the screen.
The time difference between the last cactid process completing and the last "OK" message will be disk/rrdupdate delay.
Let me know what you find. Could be you have excellent disks and no disk cache (aka poor drivers), which will make the best disk look like crap.
TheWitness
1) Run poller.php outside of CRON. All of your RRDupdates will appear on the screen
2) While running, watch for "cactid" processes while using top or ps -ef | grep cactid.
3) When the cactid processes end, time the time it takes for all the "OK" messages to clear off of the screen.
The time difference between the last cactid process completing and the last "OK" message will be disk/rrdupdate delay.
Let me know what you find. Could be you have excellent disks and no disk cache (aka poor drivers), which will make the best disk look like crap.
TheWitness
True understanding begins only when we realize how little we truly understand...
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Life is an adventure, let yours begin with Cacti!
Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages
For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
-
- Cacti User
- Posts: 59
- Joined: Tue Dec 19, 2006 4:35 pm
Who is online
Users browsing this forum: No registered users and 0 guests