cactid - weird performance issues

Post general support questions here that do not specifically fall into the Linux or Windows categories.

Moderators: Developers, Moderators

User avatar
TheWitness
Developer
Posts: 17007
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

I have provided another release of SVN Cactid tonight. Please test and post results.

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
jennam
Posts: 8
Joined: Wed Jul 12, 2006 10:13 pm
Location: Adelaide, Australia

Post by jennam »

TheWitness,

Any thoughts on my large .RRAs, etc?

I dont get the feeling that the latest cactid will help me that greatly.

Was that lib/rrd.php directed for me to try also?

Any thoughts on improving RRDtool performance if my (very large) .rra's are to blame?

Any thoughts/recommendations on better values for the RRAs? I wrote a perl tool some time ago to pipe through rrdtool export/import to resize RRAs so I'm not entirely fussed if I have to change them to something different.


Jenna
User avatar
TheWitness
Developer
Posts: 17007
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

Jenna,

It is likely not the large RRD files ad you mentioned. However, I would like to have you test this.

Also, I would like to know what OS and OS version you are running and your version of MySQL, Net-SNMP, GCC and GLIBC. Please respond.

Thanks,

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
jennam
Posts: 8
Joined: Wed Jul 12, 2006 10:13 pm
Location: Adelaide, Australia

Post by jennam »

TheWitness,

The system is a HP DL380-G1, dual PIII-1Ghz, 1GB memory, redundant failover NICs, and a pair of 18GB in a RAID1.

# cat /etc/redhat-release && uname -a
Red Hat Enterprise Linux ES release 4 (Nahant Update 3)
Linux rtrhw02 2.6.9-22.ELsmp #1 SMP Mon Sep 19 18:32:14 EDT 2005 i686 i686 i386 GNU/Linux

# rpm -qa | egrep '^(glibc|net-snmp|mysql)' | sort
gcc-3.4.5-2

glibc-2.3.4-2.19
glibc-common-2.3.4-2.19
glibc-devel-2.3.4-2.19
glibc-headers-2.3.4-2.19
glibc-kernheaders-2.4-9.1.98.EL

mysql-4.1.12-3.RHEL4.1
mysql-devel-4.1.12-3.RHEL4.1
mysql-server-4.1.12-3.RHEL4.1

net-snmp-5.1.2-11.EL4.6
net-snmp-devel-5.1.2-11.EL4.6
net-snmp-libs-5.1.2-11.EL4.6
net-snmp-utils-5.1.2-11.EL4.6

I've still got some concerns with regards to the sheer size of the RRD files on the system.

# pwd && du -h && ls -1 | wc -l
/var/www/html/cacti/rra
9.8G .
6814

I'm quite happy to listen to thoughts/recommendations/ideas with regards to resizing these RRD files. I've written a tool some time ago which I can happily size up and size down the RRA of the RRDs as required so it's not that big a deal, but I'd rather find something and stick to it across all RRDs.

The biggest reason I suspect the RRDs may be the concern are that from my earlier testing, and running cactid from the command-line, cactid seems to run and exit quite quickly, however when running from the poller.php process, it takes so much longer - I can only assume this time is spent within MySQL and the poller table and writing that data into the RRDs.

Another daunting thought is the efficiency within RRDtool itself, and whether on each 5 minute run whether the host is being forced to re-write essentially 10GB of data on the disk. This brings up obvious concerns with regards to Disk I/O bottlenecks.

Anyways, I look forward to any further thoughts you have.


Jenna
User avatar
TheWitness
Developer
Posts: 17007
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

I am not too sure. What I can say is that others have had problems such as this. So, my next question is, what version of rrdtool are you running? When you perform your configure on the package, does it show you running memory mapped I/O? If so, then it should not be writing all 10GB, just the changed elements.

You can try increasing the amount of memory. Once way to isolate the problem is to do the following:

1) watch the cactid processes while poller.php is running in one window and note when they exit.
2) have the MySQL Admin tool in another window watching the connection status and observe if MySQL is getting stuck
3) Have a Window up in MySQL Query Browser that shows the number of elements in the Poller_output table "select count(*) from poller_output;"
4) Watch the rrdtool process, note it's status

Then, if you see that 3) is not decrementing (aka is stuck), then refresh the connection status and see if there is a thread that is in a wait state. This would be an indication of a MySQL issue. If that is not the case, perform an strace on the rrdtool pipe. See if it's hung...

You get the idea. I am thinking of a Plugin right now. I do something similar in MacTrack today, where you can actually observe the polling as it takes place. It's pretty cool.

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
jennam
Posts: 8
Joined: Wed Jul 12, 2006 10:13 pm
Location: Adelaide, Australia

Post by jennam »

TheWitness

Unfortunately, like the majority of the software on this RHEL4 box we try and keep things built as packages for ease of management:

# rpm -qa | grep rrdtool
perl-rrdtool-1.2.12-1.2.el4.test
rrdtool-1.2.12-1.2.el4.test
rrdtool-devel-1.2.12-1.2.el4.test

# rrdtool -v
RRDtool 1.2.12 Copyright 1997-2005 by Tobias Oetiker <tobi@oetiker.ch>

In regards to 'increasing the amount of memory', do you mean physically increasing it? This doesnt seem like something that's required to me - the system has 1GB of RAM, doesnt hit swap, and hell, has 800GB assigned to 'cache'. It doesnt seem memory bound at all.

1. Looking from the cacti.log, and grepping the appropriate lines:
07/18/2006 11:10:14 AM - CACTID: Poller[0] Time: 12.0987 s, Threads: 20, Hosts: 72
07/18/2006 11:10:16 AM - CACTID: Poller[0] Time: 14.8981 s, Threads: 20, Hosts: 71
07/18/2006 11:10:18 AM - CACTID: Poller[0] Time: 15.4152 s, Threads: 20, Hosts: 70
07/18/2006 11:13:57 AM - SYSTEM STATS: Time:236.1299 Method:cactid Processes:3 Threads:20 Hosts:211 HostsPerProcess:71 DataSources:11611 RRDsProcessed:6551

From the poller log we see the same exit stats:
OK u:0.85 s:3.19 r:232.46
OK u:0.85 s:3.19 r:232.50
OK u:0.85 s:3.19 r:232.54
OK u:0.85 s:3.19 r:232.58
OK u:0.85 s:3.19 r:232.61
07/18/2006 11:13:57 AM - SYSTEM STATS: Time:236.1299 Method:cactid Processes:3 Threads:20 Hosts:211 HostsPerProcess:71 DataSources:11611 RRDsProcessed:6551


2. Dont know of the 'MySQL Admin Tool'; Are you able to indicate from the MySQL CLI tool what commands I should be running?

3. Did a few of these with gaps in-between:
mysql> select count(*) from poller_output;
+----------+
| count(*) |
+----------+
| 9738 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from poller_output;
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (0.00 sec)

4. I did a quick command to run a while loop whilst looking for cactid/rrdtool processes with a 1 second pause between runs. Interesting results:

500 31388 1 0 11:10 ? 00:00:00 /usr/bin/cactid 0 74
500 31388 1 13 11:10 ? 00:00:00 /usr/bin/cactid 0 74
500 31400 1 9 11:10 ? 00:00:00 /usr/bin/cactid 75 148
500 31388 1 11 11:10 ? 00:00:00 /usr/bin/cactid 0 74
500 31400 1 7 11:10 ? 00:00:00 /usr/bin/cactid 75 148
500 31419 1 9 11:10 ? 00:00:00 /usr/bin/cactid 149 228
500 31388 1 9 11:10 ? 00:00:00 /usr/bin/cactid 0 74
500 31400 1 8 11:10 ? 00:00:00 /usr/bin/cactid 75 148
500 31419 1 8 11:10 ? 00:00:00 /usr/bin/cactid 149 228
500 31444 31385 0 11:10 ? 00:00:00 /usr/bin/rrdtool -

... etc ..

500 31444 31385 3 11:10 ? 00:00:00 /usr/bin/rrdtool -
500 31419 1 8 11:10 ? 00:00:01 /usr/bin/cactid 149 228
500 31444 31385 3 11:10 ? 00:00:00 /usr/bin/rrdtool -
500 31419 1 8 11:10 ? 00:00:01 /usr/bin/cactid 149 228
500 31444 31385 3 11:10 ? 00:00:00 /usr/bin/rrdtool -
500 31419 1 7 11:10 ? 00:00:01 /usr/bin/cactid 149 228
500 31444 31385 2 11:10 ? 00:00:00 /usr/bin/rrdtool -
500 31444 31385 2 11:10 ? 00:00:00 /usr/bin/rrdtool -
500 31444 31385 2 11:10 ? 00:00:00 /usr/bin/rrdtool -
500 31444 31385 2 11:10 ? 00:00:00 /usr/bin/rrdtool -
500 31444 31385 2 11:10 ? 00:00:00 /usr/bin/rrdtool -
500 31444 31385 2 11:10 ? 00:00:00 /usr/bin/rrdtool -
500 31444 31385 2 11:10 ? 00:00:00 /usr/bin/rrdtool -

*lots of time / repeats of the same rrdtool PID*

500 31444 31385 1 11:10 ? 00:00:03 /usr/bin/rrdtool -
500 31444 31385 1 11:10 ? 00:00:03 /usr/bin/rrdtool -
500 31444 31385 1 11:10 ? 00:00:03 /usr/bin/rrdtool -
500 31444 31385 1 11:10 ? 00:00:04 /usr/bin/rrdtool -
500 31444 31385 1 11:10 ? 00:00:04 [rrdtool] <defunct>

An strace of the PID shows that it's iterating through and updating each of the seperate .rrd files quite heavily:

getrusage(RUSAGE_SELF, {ru_utime={0, 371943}, ru_stime={1, 971700}, ...}) = 0
gettimeofday({1153187234, 993813}, {4294966726, 0}) = 0
write(1, "OK u:0.37 s:1.97 r:129.29\n", 26) = 26
open("/var/www/html/cacti-0.8.6h/rra/devicename_1min_cpu_5360.rrd", O_RDWR) = 4
fstat64(4, {st_mode=S_IFREG|0644, st_size=866656, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7ffd000
read(4, "RRD\0000003\0\0\0\0/%\300\307C+\37[\1\0\0\0\f\0\0\0,\1"..., 4096) = 4096
gettimeofday({1153187234, 998246}, NULL) = 0
_llseek(4, 0, [4096], SEEK_CUR) = 0
fstat64(4, {st_mode=S_IFREG|0644, st_size=866656, ...}) = 0
_llseek(4, 864256, [864256], SEEK_SET) = 0
read(4, "\0\0\0\0\0\0\370\377\0\0\0\0\0\0\370\377\0\0\0\0\0\0\370"..., 4096) = 2400
_llseek(4, 0, [0], SEEK_SET) = 0
read(4, "RRD\0000003\0\0\0\0/%\300\307C+\37[\1\0\0\0\f\0\0\0,\1"..., 4096) = 4096
fcntl64(4, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0
mmap2(NULL, 866656, PROT_READ|PROT_WRITE, MAP_SHARED, 4, 0) = 0xb7f24000
munmap(0xb7f24000, 866656) = 0
_llseek(4, 4096, [4096], SEEK_SET) = 0
_llseek(4, -2568, [1528], SEEK_CUR) = 0
write(4, " =\274D\0\0\0\0UNKN\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1128) = 1128
close(4) = 0
munmap(0xb7ffd000, 4096)

... and repeated for each RRD.

So this seems like purely a performance issue in that we only have a single RRDtool process updating. It certainly doesnt seem to get 'stuck' at all.

'iostat 1' shows:

avg-cpu: %user %nice %sys %iowait %idle
1.00 0.00 2.49 47.76 48.76

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
ida/c0d0 277.78 31062.63 0.00 30752 0
ida/c0d0p1 0.00 0.00 0.00 0 0
ida/c0d0p2 294.95 31062.63 0.00 30752 0

avg-cpu: %user %nice %sys %iowait %idle
2.00 0.00 2.50 52.00 43.50

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
ida/c0d0 629.00 32192.00 840.00 32192 840
ida/c0d0p1 0.00 0.00 0.00 0 0
ida/c0d0p2 675.00 32192.00 840.00 32192 840


From looking at the manpage, I'm guessing that then translates to 14MB/s transfer rates. That doesnt seem -particularly- high for SCSI disks, but it's certainly not that low either. Either way, with a 40-50% I/O wait this is suggesting a disk i/o bottleneck to me.

The strange thing to me, is from the sort of stats I've seen we shouldnt be seeing that level of IO with that many hosts / RRDs. I can only put this down to the sheer size of the .rrd files that we have here.

Any input you can provide from these statistics would be appreciated


Jenna
User avatar
TheWitness
Developer
Posts: 17007
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

Jenna,

If while polling you repeatedly run:

select count(*) from poller_output;

You will get an idea of how fast rrdtool is processing. I do have patch for multiple rrdtool server processes, but I think you might find that it won't speed up things.

Larry
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
jennam
Posts: 8
Joined: Wed Jul 12, 2006 10:13 pm
Location: Adelaide, Australia

Post by jennam »

Larry,

When I did the 'elect count(*) from poller_output;' during one of the runs it just slowly (10-20 seconds) climbed to the peak figure of 9000 or so then instantly dropped to 0, with nothing in-between. Even when it was zero there were still RRDtool processes running.

I suspect on a dual CPU system, having two or three may get a performance improvement. Despite the high io-wait there is still a level of "idle" there, so it seems plausible there's even better performance to be had.

Any clues on where my performence problems lay at all at this stage, or "where to" from here (recompile rrdtool, etc)?


Jenna
User avatar
TheWitness
Developer
Posts: 17007
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

We need to do this online. :(

Larry
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Post Reply

Who is online

Users browsing this forum: No registered users and 12 guests