How to improve Cacti server, it has a reall poor performance

Post support questions that directly relate to Linux/Unix operating systems.

Moderators: Developers, Moderators

Post Reply
ocon
Posts: 25
Joined: Fri May 16, 2008 2:38 pm

How to improve Cacti server, it has a reall poor performance

Post by ocon »

Hello All,

I will like to know if anyone has any advice on how to improve my server's performance, we are using an HP Proliant DL585, 5th generation with 20GB in RAM and 8 Intel Xeon CPU (E7330 @ 2.40GHz) with Centos5 x64, cacti 8.7b with boost 1.7

Here are my system stats:

07/22/2008 08:34:35 AM - SYSTEM STATS: Time:273.6458 Method:spine Processes:1 Threads:60 Hosts:1030 HostsPerProcess:1030 DataSources:9349 RRDsProcessed:0

I was having lots of issues as the poller was reaching the 298 secs limit and aborting, thus my graphs were having lots of gaps, I did some tunning on mysql database, mostly increasing the buffers size, but I think that the servers performance can still be improved.

The average processes are 285, CPU usage is arround 30%, Load average is arround 7 and RAM is below 6 GB (5GB cache, <1GB Memory buffers). As you can see there are less then 10,000 data sources; I have seen smaller servers with more load arrond the forums, any advise is welcome.

Regards,
Ocon
Attachments
This are my poller settings.
This are my poller settings.
poller.JPG (78.86 KiB) Viewed 4368 times
Boost statistics.
Boost statistics.
boost.JPG (50.89 KiB) Viewed 4368 times
User avatar
Linegod
Developer
Posts: 1626
Joined: Thu Feb 20, 2003 10:16 am
Location: Canada
Contact:

Post by Linegod »

Change Maximum Concurrent Poller Processes to '3'.

Run the following against your database and post the output:

Code: Select all

SELECT ACTION , count( * ) AS count FROM `poller_item` GROUP BY `action`
This will give, in order the number of actions that are snmp, script, and script_server.
--
Live fast, die young
You're sucking up my bandwidth.

J.P. Pasnak,CD
CCNA, LPIC-1
http://www.warpedsystems.sk.ca
ocon
Posts: 25
Joined: Fri May 16, 2008 2:38 pm

Post by ocon »

I changed the value to 3 as adviced, I read on the documentation that it could be 1 or 2 per processor (http://docs.cacti.net/node/519) but it did not work, so I switched back to 1. I am now on 3, I will keep you posted.

Also, thank you for your query!

mysql> SELECT ACTION , count( * ) AS count FROM `poller_item` GROUP BY `action` ;
+--------+-------+
| ACTION | count |
+--------+-------+
| 0 | 1968 |
| 1 | 4097 |
| 2 | 3316 |
+--------+-------+
3 rows in set (0.00 sec)

I was just trying to get this numbers the hard way, thanks for the tip, I already new the server had a lot of scripts as DS, know I got a number.

Do you think that this might be the problem (I do know scripts are hell slower than SNMP).

Thank you again!
ocon
Posts: 25
Joined: Fri May 16, 2008 2:38 pm

Post by ocon »

As soon as I changed the MAXIMUM CONCURRENT POLLER PROCESSESS i got this on my log:

07/22/2008 11:20:03 AM - PHPSVR: Poller[0] ERROR: Input Expected, Script Server Terminating
07/22/2008 11:20:03 AM - PHPSVR: Poller[0] ERROR: Input Expected, Script Server Terminating
07/22/2008 11:20:03 AM - PHPSVR: Poller[0] ERROR: Input Expected, Script Server Terminating
07/22/2008 11:20:03 AM - PHPSVR: Poller[0] ERROR: Input Expected, Script Server Terminating
07/22/2008 11:20:03 AM - PHPSVR: Poller[0] ERROR: Input Expected, Script Server Terminating
07/22/2008 11:20:03 AM - PHPSVR: Poller[0] ERROR: Input Expected, Script Server Terminating
07/22/2008 11:20:03 AM - PHPSVR: Poller[0] ERROR: Input Expected, Script Server Terminating
07/22/2008 11:20:03 AM - PHPSVR: Poller[0] ERROR: Input Expected, Script Server Terminating
07/22/2008 11:20:03 AM - PHPSVR: Poller[0] ERROR: Input Expected, Script Server Terminating
07/22/2008 11:20:03 AM - SPINE: Poller[0] FATAL: Connection Failed: Too many connections (Spine thread)
07/22/2008 11:20:03 AM - PHPSVR: Poller[0] ERROR: Input Expected, Script Server Terminating
07/22/2008 11:20:03 AM - PHPSVR: Poller[0] ERROR: Input Expected, Script Server Terminating
07/22/2008 11:20:03 AM - PHPSVR: Poller[0] ERROR: Input Expected, Script Server Terminating
07/22/2008 11:20:03 AM - PHPSVR: Poller[0] ERROR: Input Expected, Script Server Terminating
07/22/2008 11:20:03 AM - PHPSVR: Poller[0] ERROR: Input Expected, Script Server Terminating
07/22/2008 11:20:03 AM - PHPSVR: Poller[0] ERROR: Input Expected, Script Server Terminating
07/22/2008 11:20:03 AM - PHPSVR: Poller[0] ERROR: Input Expected, Script Server Terminating
07/22/2008 11:20:03 AM - PHPSVR: Poller[0] ERROR: Input Expected, Script Server Terminating
07/22/2008 11:20:03 AM - SPINE: Poller[0] ERROR: SS[999] Script Server did not start properly return message was: 'FATAL: Cannot connect to MySQL server on 'localhost'. Please make sure you have specified a valid MySQL database name in 'include/config.php''
07/22/2008 11:20:03 AM - PHPSVR: Poller[0] ERROR: Input Expected, Script Server Terminating
07/22/2008 11:20:03 AM - SPINE: Poller[0] FATAL: Connection Failed: Too many connections (Spine thread)
07/22/2008 11:20:03 AM - SPINE: Poller[0] ERROR: SS[999] Script Server did not start properly return message was: 'FATAL: Cannot connect to MySQL server on 'localhost'. Please make sure you have specified a valid MySQL database name in 'include/config.php''

Many times... and of course, the gaps started to show...

Please advise :D
Ocon
User avatar
Linegod
Developer
Posts: 1626
Joined: Thu Feb 20, 2003 10:16 am
Location: Canada
Contact:

Post by Linegod »

ocon wrote: I was just trying to get this numbers the hard way, thanks for the tip, I already new the server had a lot of scripts as DS, know I got a number.

Do you think that this might be the problem (I do know scripts are hell slower than SNMP).!
I would lean towards the plain script ones being a problem. For reference, here are the stats on one of our servers:

Code: Select all

07/22/2008 01:50:54 PM - SYSTEM STATS: Time:53.6681 Method:spine Processes:3 Threads:8 Hosts:565 HostsPerProcess:189 DataSources:14759 RRDsProcessed:8311
+--------+-------+
| ACTION | count |
+--------+-------+
|      0 | 10709 |
|      1 |   230 |
|      2 |  3820 |
+--------+-------+
Removing or moving the plain script to script server scripts should increase your capability.

As for the MySQL error, I'm going to take a guess that increasing the poller max _did_ speed things up, but quite possibly conflicted with your max connection rate.

On you database, do the following and post:

Code: Select all

 show variables like "%max_conn%";
--
Live fast, die young
You're sucking up my bandwidth.

J.P. Pasnak,CD
CCNA, LPIC-1
http://www.warpedsystems.sk.ca
ocon
Posts: 25
Joined: Fri May 16, 2008 2:38 pm

Post by ocon »

Thank you!

I was always suspecting that our low performance was due to the scripts, I will start migrating them ASAP, but I am somehow confused, I have been looking in the forums and in the documentation, but it is still not clear for me what "Script server" scripts are... if you could give me a hint on the road to take I will really appreciate it.


When I was tunning the database I rised the Max connections, but I belive that this number aint enought, I will read more on mysql tunning, set a higher value and set Max poller to 3 again and keep you posted, you have been very helpful, again, thanks!

mysql> show variables like "%max_conn%";
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| max_connect_errors | 10 |
| max_connections | 100 |
+--------------------+-------+
2 rows in set (0.00 sec)


Regards,
Ocon
User avatar
Linegod
Developer
Posts: 1626
Joined: Thu Feb 20, 2003 10:16 am
Location: Canada
Contact:

Post by Linegod »

--
Live fast, die young
You're sucking up my bandwidth.

J.P. Pasnak,CD
CCNA, LPIC-1
http://www.warpedsystems.sk.ca
ocon
Posts: 25
Joined: Fri May 16, 2008 2:38 pm

Post by ocon »

Thanks!

Whenever you come to Mexico, you got a bottle of tequila waiting for you! After increasing mysql max cons and Max pollers I got:

07/22/2008 01:46:09 PM - SYSTEM STATS: Time:67.6868 Method:spine Processes:8 Threads:65 Hosts:1031 HostsPerProcess:129 DataSources:9300 RRDsProcessed:0

Now I will read about scripts servers and migrate when possible, but again, thanks for all your help.

Ocon
User avatar
Linegod
Developer
Posts: 1626
Joined: Thu Feb 20, 2003 10:16 am
Location: Canada
Contact:

Post by Linegod »

Cool :)

If it where me, I would lower the max pollers to 6, and threads down to 30, just to see if it still functions OK. 8 and 65 may be too much overhead with your current requirements. This will also let you know if you have room to grow.

But the transitioning of scripts to script server should help a lot.

Good luck....
--
Live fast, die young
You're sucking up my bandwidth.

J.P. Pasnak,CD
CCNA, LPIC-1
http://www.warpedsystems.sk.ca
ocon
Posts: 25
Joined: Fri May 16, 2008 2:38 pm

Post by ocon »

I am actually running like this:

07/22/2008 02:36:29 PM - SYSTEM STATS: Time:88.3715 Method:spine Processes:4 Threads:50 Hosts:1031 HostsPerProcess:258 DataSources:9301 RRDsProcessed:0

As I saw some server overload when using Processes:8 as you mention, so I lower it to 4, now after this latest configuration everything is working smoothly, It will be great to know what gives better results:
More spine Processes with Less threads or;
Less spine Processes with higher threads.

If no one comes with a sugestion, I will leave it like it is now; wait for some time to populate performance graph and swith to higer Procs/lower threads and compare the graphs.

I will also try to migrate my DS to script servers, as soon as I get the best results, I will post them, posts are always helpful.

Ocon
ocon
Posts: 25
Joined: Fri May 16, 2008 2:38 pm

Post by ocon »

Now... this is odd,

I migrated my .sh scripts to comply with script server; now my polling cycle went up about 100 seconds. Wasn't supposed to be better now?

I will switch back to .sh I believe....

Ocon.
User avatar
Linegod
Developer
Posts: 1626
Joined: Thu Feb 20, 2003 10:16 am
Location: Canada
Contact:

Post by Linegod »

Did you actually change the data input method to 'Get Script Server Data' vs 'Get Script Data' under Data Queries?
--
Live fast, die young
You're sucking up my bandwidth.

J.P. Pasnak,CD
CCNA, LPIC-1
http://www.warpedsystems.sk.ca
User avatar
TheWitness
Developer
Posts: 17007
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

There's a good instructional in the docs on setting up the script server. I want some of that hard liquor too :)

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Post Reply

Who is online

Users browsing this forum: No registered users and 0 guests