cacti scalability (was cacti performance considerations)

rcnavas · Post by **rcnavas** » Mon Mar 11, 2002 11:55 pm

Hey!

I just find out another tweak that has improved cmd.php processing time a lot!
(from 9 minutes to 2 minutes with 300 interfaces)
While cmd.php was running, I monitored my Mysql server processes (using phpMyAdmin), and I noticed that some select queries where taking a long time (several seconds)... most of these queries involved tables like rrd_ds, src_data and src_fields. So I checked the structure of these tables, and added indexes to all columns that ended in "ID" (like DSID in src_data)... after this change I wasn't able to see running processes in MySQL server anymore (doing a refresh on phpMyAdmin process list page) which is a good indication that the queries were executing fast enough. After this, my 300 interfaces where collected in 1:30 minutes and my server load was lowered considerabily (from loadavg of 6.0 to 1.0)...

Please try this and post any results...

Bye!

drub · Post by **drub** » Fri Mar 15, 2002 5:09 pm

rcnavas: can you give a better example, I guess I'm confused about adding indexes to these tables (How to), and what that does.

robsweet · Post by **robsweet** » Sat Mar 23, 2002 2:42 pm

Hi Guys.

Ok, so I'm really new to NMS but based on what I'm being asked to do at work (monitor thousands of interfaces), the threaded C polling daemon sounds like a really good idea. I'm an avid PHP fan but just as I don't use Perl for web stuff any more, I try not to use PHP for non-web stuff.

jwiegley, made any progress on this? I may be able to recruit a C guy from work to help out if needed (I'm not a C guy myself).

Please let me know what's happening with this. Don't let me get stuck in Cricket Hell.

Rob.

integr8er · Post by **integr8er** » Thu Mar 28, 2002 6:14 pm

Hi,

Newbie Cacti User here.

I am currently tasked with also needing to deploy Cacti for several hundred systems. I'm guessing there will probably be from 6 to 12 graphs per system. As I am getting familiar with the package, it becomes apparent that scalability will become an issue. Then I find that this thread already exists on the topic. This is a good thing.

BTW - The package is great! - *REALLY*.

I'd like to comment on the things I've already seen here and some thoughts of my own.

Okay - the goal/concept of Cacti doing it's own decent multi-threading for the execution of data collection/polling jobs is certainly the ultimate ideal

. But a single monolithic process despooler might have it's own set of problems too.

1. Using a c program (i.e. spine) sounds great, but I'm very happy with scripting based tools (no offense). I'm pretty satisfied with performance of scripting tools with todays CPU's. The situation here is not a performance issue that demands a c program (for performance), but for the ability to bring data collection parallelization/management into the picture.

2. It is apparent that the package itself has some MySQL database inefficiencies that affect the data collection performance. Therefore, adding the indexes to the database as already done by [b]rcnavas[/b] appears to be a positive thing and is commonly done to databases to improve them.

3. When it comes to getting some significant gain with minimal input, I like the idea put forth by [b]pyuska[/b]. Adding the "collection" field to group the execution of data collection scripts/tasks/processes seems relatively easy and useful. It's a basic extension that delivers results without having to do too much re-engineering of the package. I suspect there is a not so obvious manageability advantage here too...

Some of my thoughts:

A. Observing the section in Cacti, called "Cron Printout" brought to mind the idea of cmd.php generating a list of jobs and then submitting them to a KSH shell script. It naturally has job control features that should be able to manage N simultaneous processes. The Ksh script that does this work could have features to limit the number of executing processes as needed. However, the matter of catching each processes STDOUT (back to Cacti) and STDERR (for logging purposes) is a problem (but doable). The performance of KSH would not itself be an issue since all it would be doing is spawning processes and waiting for their completion and coordinating their results to be fed back to the cmd.php job.

B. At this time (due to my particular needs) I seem to favor the multiple cmd.php cron entries along with the "collection" database entry modification done by [b]pyuska[/b] as this allows a reasonable degree of parallelization. I'm considering that I'll be making various groupings of systems that each could be dealt with as a single cmd.php job in cron. Each one would have a workload that is probably small enough that it would not have problems completing within the 5 minute interval. This also allows me to enable/disable a whole thread (i.e. cron cmd.php job) without disrupting the others. This might be well suited to this environment. The environment is a data center for a financial company where various systems are grouped to an extent by applications. For example an internet banking web server farm with associated application and database servers would be one grouping. Another grouping could be another bunch of web servers and similarly associated backend systems that is used for managing home mortgages. Each of these groupings is managed by N Sysadmins and I'm thinking that each group would have it's own graph tree and associated user(s) that would manage them. Again - each group would be serviced by it's own cmd.php instance. Although I might also consider naming the cmd.php instances by name rather than a number for easier readability. Try to avoid cross referencing numbers unnecessarily

. In general this seems like it could be the easiest to implement to get acceptable performance and manageability results. The more I think about this, I'm sounding like I'm advocating cron threads based on a user group and or workload. But - I suppose if you still had to collect 1000 datapoints under a single group your problem is not yet solved without having to manually break them into groups. I also like this idea as the STDERR output of each thread (via the cron commmandline I/O redirection) is still synchronously directed to a log file for that thread (which can be tailed -f). That log file could then be visible to the group who has that groups responsibility. For a small environment, the grouping field in the database for data sources would all default to "default" so it all would run serially and basically consistent with the way it is today. So you only have to manage groupings if your workload warrants it.

All in all, it seems that Cacti is a great package and I commend the author for building it. For my home network, this is awesome. For the large scale side of things, it needs some work, but is a VERY worthy starting point. Posts that mention having to deal with pop up menus of hundreds of data sources sound like they have merit too and would also need to be dealt with... Thus it seems that some significant portion of the package needs some re-engineering(again for large scale needs). Looks to me like there is a pretty cool future for this thing.

Thank You for your time
[/b]

h3steinhauer · Post by **h3steinhauer** » Mon Apr 01, 2002 12:59 pm

Another Newbie to this view point but an old hand at MRTG.

I like the divide and understand process. Where is the time being spent? Then we can know how to improve the process.

The Index items - Standard Database issues - Great find. I would expect as the number of monitored items increases, there will be other Indexes that will be needed to keep the speed up. ? How to know when to add them? And how to tell where they need to be added ? - Anyone good with Tunning Mysql ? Probably a good doc link here would help us all.

- The SNMP Polling - Takes time. - The SNMP Bulk Get is great since the device being polled only needs to give one long response back. Also - Sending more than one request in a poll helps the whole process. That was one of the weaknesses of MRTG. It only asked for a very small number of respones even though you were stepping through a switch with 128 interfaces !.

- Data updating and then Graph creation. I know from working with MRTG that creating the Graph does take time. That was why RRD was so great. You only needed to create the graph when someone needed to view it. But then you have NOC users that feel they need an updated version of ALL the graphs every update cycle. Needing to balance that, you group the graphs so only the critical ones are updated and the rest are updated on demand.

- Data Writing - IDE devices suffer from write issues when you have too many interfaces being monitored. SCSI and offloaded I/O helps in those cases. Again just a matter of scaling.

Good luck on the scaling.

Henry

jwiegley · Post by **jwiegley** » Mon Apr 08, 2002 6:19 pm

Its been a month since I posted anything. Unfortunately the reason behind this
absence has been that I lost my job. So I've spent the last few weeks putting
the pieces of my personal life back in order. Losing the job was ok though since
I pretty much worked for Satan.

Anyhow, I've been doing a bit more programming on spine the last couple of
days. Watch the "Spine" topic for updates/feedback from now on.

Cacti

cacti scalability (was cacti performance considerations)

most bang for the buck

Performance Tunning

latest update.

Who is online