Cacti performance Issues

Post general support questions here that do not specifically fall into the Linux or Windows categories.

Moderators: Developers, Moderators

Post Reply
donClemens
Posts: 3
Joined: Tue Jul 17, 2012 5:06 am

Cacti performance Issues

Post by donClemens »

Hi,

the title looks like a pretty common problem but in this case it seems a bit tricky.
i am surely not a cacti pro but i am also not a total newbie and i would say, i know a bit what i am doing.
not to the problem case:

some time ago, i have set up a cacti server, that is still running on 0.8.7i with plugin architecture, boost plugin, spine etc etc
this server currently handles about 250k datasources.

some words about the server setup:
this server is a vm
Red Hat Enterprise Linux Server release 5.9 (Tikanga)
4 cpus/2cores each, 12GB RAM

the new server is also a vm
Red Hat Enterprise Linux Server release 6.4 (Santiago)
8 cpus/2 cores each, 16GB RAM
currently running cacti 0.8.8a also with spine and boost plugin.

this should be a redundant setup in the future, so pretty much everything is the same exept of the server itself (sizing, datacenter, ip etc)
this means, as long as i dont explicitly say it, everything in this posting applies for both servers.

there are some devices, that send data regularly via files, that are being parsed and fed into memory tables within a mysql database for speed purposes, because it is much faster, than letting a script directly gather the data from the files.

those tables look a bit like this:

Code: Select all

CREATE TABLE `DMS_WHATEVER_TRK` (
  `name` varchar(20) DEFAULT NULL,
  `direction` char(2) DEFAULT NULL,
  `nccts` int(5) DEFAULT NULL,
  `nwccts` int(5) DEFAULT NULL,
  `incatot` int(5) DEFAULT NULL,
  `prerteab` int(5) DEFAULT NULL,
  `infail` int(5) DEFAULT NULL,
  `nattmpt` int(5) DEFAULT NULL,
  `novflatb` int(5) DEFAULT NULL,
  `glare` int(5) DEFAULT NULL,
  `outfail` int(5) DEFAULT NULL,
  `defldca` int(5) DEFAULT NULL,
  `dreu` int(5) DEFAULT NULL,
  `preu` int(5) DEFAULT NULL,
  `tru` int(5) DEFAULT NULL,
  `sbu` int(5) DEFAULT NULL,
  `mbu` int(5) DEFAULT NULL,
  `outmtchf` int(5) DEFAULT NULL,
  `connect` int(5) DEFAULT NULL,
  `tandem` int(5) DEFAULT NULL,
  `aof` int(5) DEFAULT NULL,
  `anf` int(5) DEFAULT NULL,
  `totu` int(5) DEFAULT NULL,
  `answer` int(5) DEFAULT NULL,
  `acccong` int(5) DEFAULT NULL,
  `noanswer` int(5) DEFAULT NULL,
  `inanswer` int(5) DEFAULT NULL,
  `outansu` int(5) DEFAULT NULL,
  `inansu` int(5) DEFAULT NULL,
  `date` datetime DEFAULT NULL
) ENGINE=MEMORY DEFAULT CHARSET=latin1
to gather data from this tables, script queries are used:

Code: Select all

-bash-4.1$ cat /u01/appl/cacti/resource/script_queries/DMS_TRK.xml
<interface>
        <name>DMS100 Trunkquery</name>
        <description>Get Values for DMS100 Trunk</description>
        <script_path>perl /ramdisk/dbquery_DMS.pl </script_path>
        <arg_prepend>|host_hostname|_TRK</arg_prepend>
        <arg_index>index</arg_index>
        <arg_query>query</arg_query>
        <arg_get>get</arg_get>
        <arg_num_indexes>num_indexes</arg_num_indexes>
        <output_delimeter>:</output_delimeter>
        <index_order_type>alphabetic</index_order_type>
        <index_title_format>|chosen_order_field|</index_title_format>

        <fields>
                <TrkName>
                        <name>Trunkname</name>
                        <direction>input</direction>
                        <query_name>name</query_name>
                </TrkName>
this is pretty much the same on both servers, exept, that on the new one the script has been put into ramdisk, on the old setup it is on the HDD.
furthermore, because i did not know what i am doing, i had one xml file for each DMS switch having the name directly in the script path tag:

Code: Select all

<script_path>perl /u01/appl/cacti/www/previsorinput/DMS.pl DMS_Whatever</script_path>
bit this is definitly not the difference that causes the issues, i have already tested it on the new server with the one file/switch variant.

what the perl script does:

Code: Select all

if ($ARGV[1] eq "get") {
  $_=`/usr/bin/mysql -uroot -pcnoNMS -s --skip-column-names -e 'use DMS_OMdata; SELECT $ARGV[2] from $ARGV[0] where name="$ARGV[3]" and date=(select max(date) from $ARGV[0] where name="$ARGV[3]");'`;
  chomp;
  print "$_";
#if ("$_" eq "") {
#  sleep(1);
#  $_=`/usr/bin/mysql -uroot -pcnoNMS -s --skip-column-names -e 'use DMS_OMdata; SELECT $ARGV[2] from $ARGV[0] where name="$ARGV[3]" and date=(select max(date) from $ARGV[0] where name="$ARGV[3]");'`;
#  chomp;
#  print "$_";
#  exit 0; }
  exit 0;
}
pretty simple code just to gather data from the db, optimized for speed.

now what is the problem:

old server, with less hardware processes a request over 3 DMS switches with spine within ~30 seconds (15k data sources -> 15k mysql queries)
exactly the same 3 hosts take about 100 seconds ont he new host.

here is the my.cnf of the old server:

Code: Select all

-bash-3.2$ cat /etc/my.cnf
[mysqld]
datadir=/u01/appl/cacti/www/mysql/
# socket=/u01/appl/cacti/www/mysql/mysql.sock
# datadir=/var/lib/mysql/
socket=/var/lib/mysql/mysql.sock
user=mysql
max_connections=2500
query_cache_limit  = 64M
query_cache_type = 1
query_cache_size = 256M
join_buffer_size = 256K
thread_cache_size = 8
skip-locking
key_buffer = 512M
query_cache_size = 128M
max_allowed_packet = 16M
table_cache = 1024
sort_buffer_size = 128M
net_buffer_length = 8K
read_buffer_size = 1M
read_rnd_buffer_size = 32M
myisam_sort_buffer_size = 8M
max_heap_table_size = 4G
tmp_table_size=1G;
log_slow_queries
long_query_time = 2
log_long_format
innodb_buffer_pool_size = 256M
skip-innodb
skip-bdb

# Default to using old password format for compatibility with mysql 3.x
# clients (those using the mysqlclient10 compatibility package).
old_passwords=1

# Disabling symbolic-links is recommended to prevent assorted security risks;
# to do so, uncomment this line:
# symbolic-links=0

[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
and this one is the my.cnf of the new server:

Code: Select all

-bash-4.1$ cat /etc/my.cnf
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
user=mysql
max_connections=8196
query_cache_limit  = 64M
query_cache_type = 1
query_cache_size = 256M
join_buffer_size = 256K
thread_cache_size = 8
skip-locking
key_buffer = 512M
query_cache_size = 128M
max_allowed_packet = 16M
table_cache = 8192
sort_buffer_size = 128M
net_buffer_length = 8K
read_buffer_size = 16M
read_rnd_buffer_size = 32M
myisam_sort_buffer_size = 128M
max_heap_table_size = 6G
thread_cache_size = 16
thread_concurrency = 8
tmp_table_size=1G;
log_slow_queries
long_query_time = 2
log_long_format
innodb_buffer_pool_size = 256M
skip-innodb



# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0

[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
i guess mysql is not the bottleneck, since i have tested it with exactly the same configs also, but this also didnt make a difference.
new server mysql version:
mysql Ver 14.14 Distrib 5.1.69, for redhat-linux-gnu (x86_64) using readline 5.1

old one:
mysql Ver 14.12 Distrib 5.0.95, for redhat-linux-gnu (x86_64) using readline 5.1

on the old one the poller output looks abit like this:
07/22/2013 02:07:24 PM - SPINE: Poller[0] Host[27] DEBUG: The NIFTY POPEN returned the following File Descriptor 8
07/22/2013 02:07:24 PM - SPINE: Poller[0] Host[28] TH[1] DS[155034] SCRIPT: perl /u01/appl/cacti/www/previsorinput/DMS.pl DMS_ARNHNLUT01H get outfail NL02RUN_B003, output: 0
07/22/2013 02:07:24 PM - SPINE: Poller[0] Host[28] DEBUG: The NIFTY POPEN returned the following File Descriptor 6
07/22/2013 02:07:24 PM - SPINE: Poller[0] Host[27] TH[1] DS[201981] SCRIPT: perl /u01/appl/cacti/www/previsorinput/DMS.pl DMS_AMSTNLA201A get outfail NL01SMCZ_B003, output: 0
07/22/2013 02:07:24 PM - SPINE: Poller[0] Host[27] DEBUG: The NIFTY POPEN returned the following File Descriptor 8
07/22/2013 02:07:24 PM - SPINE: Poller[0] Host[28] TH[1] DS[155037] SCRIPT: perl /u01/appl/cacti/www/previsorinput/DMS.pl DMS_ARNHNLUT01H get infail NL02SAX__B001, output: 1
07/22/2013 02:07:24 PM - SPINE: Poller[0] Host[28] DEBUG: The NIFTY POPEN returned the following File Descriptor 6
07/22/2013 02:07:24 PM - SPINE: Poller[0] Host[27] TH[1] DS[204677] SCRIPT: perl /u01/appl/cacti/www/previsorinput/DMS.pl DMS_AMSTNLA201A get nwccts NL01UNAM_B011, output: 30
07/22/2013 02:07:24 PM - SPINE: Poller[0] Host[27] DEBUG: The NIFTY POPEN returned the following File Descriptor 8
07/22/2013 02:07:24 PM - SPINE: Poller[0] Host[28] TH[1] DS[155037] SCRIPT: perl /u01/appl/cacti/www/previsorinput/DMS.pl DMS_ARNHNLUT01H get outfail NL02SAX__B001, output: 0
07/22/2013 02:07:24 PM - SPINE: Poller[0] Host[28] DEBUG: The NIFTY POPEN returned the following File Descriptor 6
07/22/2013 02:07:24 PM - SPINE: Poller[0] Host[27] TH[1] DS[204680] SCRIPT: perl /u01/appl/cacti/www/previsorinput/DMS.pl DMS_AMSTNLA201A get connect NL01UNAM_B011, output: 0
07/22/2013 02:07:24 PM - SPINE: Poller[0] Host[27] DEBUG: The NIFTY POPEN returned the following File Descriptor 8
07/22/2013 02:07:24 PM - SPINE: Poller[0] Host[28] TH[1] DS[155040] SCRIPT: perl /u01/appl/cacti/www/previsorinput/DMS.pl DMS_ARNHNLUT01H get infail NL02SAX__B004, output: 1
07/22/2013 02:07:24 PM - SPINE: Poller[0] Host[28] DEBUG: The NIFTY POPEN returned the following File Descriptor 6
07/22/2013 02:07:24 PM - SPINE: Poller[0] Host[27] TH[1] DS[205262] SCRIPT: perl /u01/appl/cacti/www/previsorinput/DMS.pl DMS_AMSTNLA201A get nattmpt NL01AXFLXB002, output: 0
07/22/2013 02:07:24 PM - SPINE: Poller[0] Host[27] DEBUG: The NIFTY POPEN returned the following File Descriptor 8
07/22/2013 02:07:24 PM - SPINE: Poller[0] Host[28] TH[1] DS[155040] SCRIPT: perl /u01/appl/cacti/www/previsorinput/DMS.pl DMS_ARNHNLUT01H get outfail NL02SAX__B004, output: 0
07/22/2013 02:07:24 PM - SPINE: Poller[0] Host[28] DEBUG: The NIFTY POPEN returned the following File Descriptor 6
07/22/2013 02:07:24 PM - SPINE: Poller[0] Host[27] TH[1] DS[204644] SCRIPT: perl /u01/appl/cacti/www/previsorinput/DMS.pl DMS_AMSTNLA201A get nccts NL01NUOCCB039, output: 30
07/22/2013 02:07:24 PM - SPINE: Poller[0] Host[27] DEBUG: The NIFTY POPEN returned the following File Descriptor 8
07/22/2013 02:07:24 PM - SPINE: Poller[0] Host[28] TH[1] DS[155043] SCRIPT: perl /u01/appl/cacti/www/previsorinput/DMS.pl DMS_ARNHNLUT01H get infail NL02SGM__B001, output: 0
07/22/2013 02:07:24 PM - SPINE: Poller[0] Host[28] DEBUG: The NIFTY POPEN returned the following File Descriptor 6
07/22/2013 02:07:24 PM - SPINE: Poller[0] Host[27] TH[1] DS[204641] SCRIPT: perl /u01/appl/cacti/www/previsorinput/DMS.pl DMS_AMSTNLA201A get answer NL01NUOCCB039, output: 0
07/22/2013 02:07:24 PM - SPINE: Poller[0] Host[27] DEBUG: The NIFTY POPEN returned the following File Descriptor 8
07/22/2013 02:07:24 PM - SPINE: Poller[0] Host[28] TH[1] DS[155043] SCRIPT: perl /u01/appl/cacti/www/previsorinput/DMS.pl DMS_ARNHNLUT01H get outfail NL02SGM__B001, output: 0
on the new one
07/22/2013 02:07:12 PM - SPINE: Poller[0] Host[22] TH[6] DS[61995] SCRIPT: perl /ramdisk/dbquery_DMS.pl DMS_ARNHNLUT01H_UTIL get util NL02UMCGNB015, output: 0.00000
07/22/2013 02:07:12 PM - SPINE: Poller[0] Host[21] DEBUG: The NIFTY POPEN returned the following File Descriptor 25
07/22/2013 02:07:12 PM - SPINE: Poller[0] Host[22] DEBUG: The NIFTY POPEN returned the following File Descriptor 8
07/22/2013 02:07:12 PM - SPINE: Poller[0] Host[21] TH[3] DS[36424] SCRIPT: perl /ramdisk/dbquery_DMS.pl DMS_AMSTNLA201A_TRK get sbu NL01PTFLXB092, output: 0
07/22/2013 02:07:12 PM - SPINE: Poller[0] Host[21] DEBUG: The NIFTY POPEN returned the following File Descriptor 6
07/22/2013 02:07:12 PM - SPINE: Poller[0] Host[21] TH[2] DS[35214] SCRIPT: perl /ramdisk/dbquery_DMS.pl DMS_AMSTNLA201A_TRK get defldca NL01NUOCCB019, output: 0
07/22/2013 02:07:12 PM - SPINE: Poller[0] Host[21] TH[4] DS[37614] SCRIPT: perl /ramdisk/dbquery_DMS.pl DMS_AMSTNLA201A_TRK get defldca NL01UNA__B007, output: 0
07/22/2013 02:07:12 PM - SPINE: Poller[0] Host[21] DEBUG: The NIFTY POPEN returned the following File Descriptor 26
07/22/2013 02:07:12 PM - SPINE: Poller[0] Host[21] DEBUG: The NIFTY POPEN returned the following File Descriptor 12
07/22/2013 02:07:12 PM - SPINE: Poller[0] Host[22] TH[1] DS[63988] SCRIPT: perl /ramdisk/dbquery_DMS.pl DMS_ARNHNLUT01H_TRK get answer UNL02NL01IMTSN, output: 6
07/22/2013 02:07:12 PM - SPINE: Poller[0] Host[21] TH[1] DS[34008] SCRIPT: perl /ramdisk/dbquery_DMS.pl DMS_AMSTNLA201A_TRK get nattmpt NL01CAT__B007, output: 30
07/22/2013 02:07:12 PM - SPINE: Poller[0] Host[22] DEBUG: The NIFTY POPEN returned the following File Descriptor 7
07/22/2013 02:07:12 PM - SPINE: Poller[0] Host[21] DEBUG: The NIFTY POPEN returned the following File Descriptor 29
07/22/2013 02:07:12 PM - SPINE: Poller[0] Host[22] TH[6] DS[61996] SCRIPT: perl /ramdisk/dbquery_DMS.pl DMS_ARNHNLUT01H_TRK get answer NL02UMCU_B001, output: 43
07/22/2013 02:07:12 PM - SPINE: Poller[0] Host[22] DEBUG: The NIFTY POPEN returned the following File Descriptor 8
07/22/2013 02:07:12 PM - SPINE: Poller[0] Host[22] TH[5] DS[61079] SCRIPT: perl /ramdisk/dbquery_DMS.pl DMS_ARNHNLUT01H_TRK get connect NL02SIZ__B003, output: 2
07/22/2013 02:07:12 PM - SPINE: Poller[0] Host[22] DEBUG: The NIFTY POPEN returned the following File Descriptor 9
07/22/2013 02:07:12 PM - SPINE: Poller[0] Host[22] TH[3] DS[59073] SCRIPT: perl /ramdisk/dbquery_DMS.pl DMS_ARNHNLUT01H_TRK get mbu NL02DRG__B001, output: 0
so what i see, there is more of those NIFTY POPEN messages but i am not sure if this causes the 3 times higher time that is needed


oh yeah and some spine debug info, if needed:
-bash-4.1$ ./spine -R -f 200 -l 200
SPINE: Using spine config file [spine.conf]
07/22/2013 02:15:25 PM - SPINE: Poller[0] DEBUG: The path_php_server variable is /u01/appl/cacti/script_server.php
07/22/2013 02:15:25 PM - SPINE: Poller[0] DEBUG: The path_cactilog variable is /u01/appl/cacti/log/cacti.log
07/22/2013 02:15:25 PM - SPINE: Poller[0] DEBUG: The log_destination variable is 1 (FILE)
07/22/2013 02:15:25 PM - SPINE: Poller[0] DEBUG: The path_php variable is /usr/bin/php
07/22/2013 02:15:25 PM - SPINE: Poller[0] DEBUG: The availability_method variable is 2
07/22/2013 02:15:25 PM - SPINE: Poller[0] DEBUG: The ping_recovery_count variable is 3
07/22/2013 02:15:25 PM - SPINE: Poller[0] DEBUG: The ping_failure_count variable is 2
07/22/2013 02:15:25 PM - SPINE: Poller[0] DEBUG: The ping_method variable is 2
07/22/2013 02:15:25 PM - SPINE: Poller[0] DEBUG: The ping_retries variable is 1
07/22/2013 02:15:25 PM - SPINE: Poller[0] DEBUG: The ping_timeout variable is 400
07/22/2013 02:15:25 PM - SPINE: Poller[0] DEBUG: The snmp_retries variable is 3
07/22/2013 02:15:25 PM - SPINE: Poller[0] DEBUG: The log_perror variable is 1
07/22/2013 02:15:25 PM - SPINE: Poller[0] DEBUG: The log_pwarn variable is 1
07/22/2013 02:15:25 PM - SPINE: Poller[0] DEBUG: The boost_redirect variable is 1
07/22/2013 02:15:25 PM - SPINE: Poller[0] DEBUG: The log_pstats variable is 1
07/22/2013 02:15:25 PM - SPINE: Poller[0] DEBUG: The threads variable is 20
07/22/2013 02:15:25 PM - SPINE: Poller[0] DEBUG: The polling interval is 300 seconds
07/22/2013 02:15:25 PM - SPINE: Poller[0] DEBUG: The number of concurrent processes is 8
07/22/2013 02:15:25 PM - SPINE: Poller[0] DEBUG: The script timeout is 25
07/22/2013 02:15:25 PM - SPINE: Poller[0] DEBUG: The number of php script servers to run is 1
07/22/2013 02:15:25 PM - SPINE: Poller[0] DEBUG: StartHost='200', EndHost='200', TotalPHPScripts='0'
07/22/2013 02:15:25 PM - SPINE: Poller[0] DEBUG: The PHP Script Server is Not Required
07/22/2013 02:15:25 PM - SPINE: Poller[0] DEBUG: The Maximum SNMP OID Get Size is 100
07/22/2013 02:15:25 PM - SPINE: Poller[0] Version 0.8.8a starting
07/22/2013 02:15:25 PM - SPINE: Poller[0] DEBUG: MySQL is Thread Safe!
07/22/2013 02:15:25 PM - SPINE: Poller[0] WARNING: Spine NOT running asroot. This is required if using ICMP. Please run "chmod +s;chown root:root spine" to resolve.
07/22/2013 02:15:25 PM - SPINE: Poller[0] SPINE: Initializing Net-SNMP API
07/22/2013 02:15:25 PM - SPINE: Poller[0] DEBUG: Issues with SNMP Header Version information, assuming old version of Net-SNMP.
07/22/2013 02:15:25 PM - SPINE: Poller[0] SPINE: Initializing PHP Script Server(s)
07/22/2013 02:15:25 PM - SPINE: Poller[0] NOTE: Spine will support multithread device polling.
07/22/2013 02:15:25 PM - SPINE: Poller[0] NOTE: Spine is behaving in a 0.8.7g+ manner
07/22/2013 02:15:25 PM - SPINE: Poller[0] DEBUG: Initial Value of Active Threads is 0
07/22/2013 02:15:25 PM - SPINE: Poller[0] DEBUG: Valid Thread to be Created
07/22/2013 02:15:25 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 1
07/22/2013 02:15:25 PM - SPINE: Poller[0] DEBUG: In Poller, About to Start Polling of Host
07/22/2013 02:15:25 PM - SPINE: Poller[0] Host[0] TH[1] Total Time: 0.0027 Seconds
07/22/2013 02:15:25 PM - SPINE: Poller[0] Host[0] TH[1] DEBUG: HOST COMPLETE: About to Exit Host Polling Thread Function
07/22/2013 02:15:25 PM - SPINE: Poller[0] DEBUG: The Value of Active Threads is 0
07/22/2013 02:15:25 PM - SPINE: Poller[0] DEBUG: Thread Cleanup Complete
07/22/2013 02:15:25 PM - SPINE: Poller[0] DEBUG: PHP Script Server Pipes Closed
07/22/2013 02:15:25 PM - SPINE: Poller[0] DEBUG: Allocated Variable Memory Freed
07/22/2013 02:15:25 PM - SPINE: Poller[0] DEBUG: MYSQL Free & Close Completed
07/22/2013 02:15:25 PM - SPINE: Poller[0] DEBUG: Net-SNMP Close Completed
07/22/2013 02:15:25 PM - SPINE: Poller[0] Time: 0.1215 s, Threads: 20, Hosts: 1
so anyone has an idea, why the new setup is so much sower even now that the hardware is better?
or maybe any idea what i could query to find out what causes the slower pollings?
Post Reply

Who is online

Users browsing this forum: No registered users and 2 guests