Distributed Cacti - Ideas

Post general support questions here that do not specifically fall into the Linux or Windows categories.

Moderators: Developers, Moderators

User avatar
rcaston
Cacti User
Posts: 204
Joined: Tue Jan 06, 2004 7:47 pm
Location: US-Dallas, TX
Contact:

Distributed Cacti - Ideas

Post by rcaston »

I'd like to start a post to discuss how we could handle using cacti to poll extremely large enterprise environments which would require more performance from a Cacti system than a single server can provide.

My thoughts are the problem could be addressed by proceeding with one of the following two approaches:

1) Distributing out the data collection: Setting it up so a number of specific devices are polled from remote pollers that do not run on the cacti server itself. Since some of the load of the poller is the initial mysql calls; I would imagine moving to a mysql read-tier (multiple mysql replicated servers) for the many distributed pollers to hit would be mandatory here.

or

2) Integrate the data presentation: Either thru a re-write of cacti or potentially via a plugin. My thoughts here would be a simple plugin which could load up the url to the other Cacti box when clicked. Currently this could be done using a modification of the "ntop" plugin example to call a url for exported cacti data; however - the data presentation of the exported data would include some of the table images (wrappers around the table data, images) etc, which would not look right. Also; for this to work, the current bugs in graph_export.php would need to be fixed.


I'd like to invite people to share your thoughts and or experiences with this issue. Hopefully this post goes sticky.
Last edited by rcaston on Mon Apr 16, 2007 3:59 pm, edited 2 times in total.
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: Distributed Cacti Project

Post by gandalf »

rcaston wrote:I'd like to start a post to discuss how we could handle using cacti to poll extremely large enterprise environments which would require more performance from a Cacti system than a single server can provide.

My thoughts are the problem could be addressed by proceeding with one of the following two approaches:

1) Distributing out the data collection: Setting it up so certain devices are polled from a remote poller that does not run on the cacti server itself. Since some of the load of the poller is the initial mysql calls; I would imagine moving to a mysql read-tier (multiple mysql replicated servers) for the distributed pollers to hit would be mandatory here.
This does not yet solve the issue with "rrdtool update". Where do you suppose the rrd files to live?
If "distributed": you will get an issue with graphing based on distributed rrd files
Centralized, nfs mount: in my opinion a performance bottleneck. I'm aware of at least one very big installation that uses this method currently with success. But I'm quite sure it _will_ become a bottleneck
Centralized, using "rrdtool server": Nice one, but perhaps it will require some more features as "security". I'm quite sure, that this will work "only" with bulk updates (see: boost plugin). I bet it will be too slow for just-in-time updates.
Reinhard
User avatar
rcaston
Cacti User
Posts: 204
Joined: Tue Jan 06, 2004 7:47 pm
Location: US-Dallas, TX
Contact:

Post by rcaston »

Yes, there would be an I/O bottleneck as the RRDs would end up existing on different systems; but given the idea that this is for the enterprise; I don't see that being an issue for those who need it. ie: if we have to we'll stick this on a high end NAS.

...

further thoughts on the "Distributed Collector" idea would be that a new field would be added to each device in the mysql database which would specify association between the device and a "poller template"

the 'poller template' would have the necessary information associated with it to determine which of several pollers will query and poll for that object.

example: 2 pollers -

Each would query his local replicated copy of the mysql cacti database for the list of objects which match his pollerid field. (ie: give me a list of stuff i am supposed to poll and collect rrds for)

this would keep the rrds and their pollers(various cactids) on different boxes.


---

all this being said; it would still likely be easier to go with idea #2 - which is to integrate the data presentation / web interface of several stand alone cacti boxes.
User avatar
rony
Developer/Forum Admin
Posts: 6022
Joined: Mon Nov 17, 2003 6:35 pm
Location: Michigan, USA
Contact:

Post by rony »

FYI, distributed polling doesn't mean that the polling machine needs access to the rrdtool files.

In the model proposed for 0.9.0, the poller is distributable, but the rrdtool updates occur on a central machine.
[size=117][i][b]Tony Roman[/b][/i][/size]
[size=84][i]Experience is what causes a person to make new mistakes instead of old ones.[/i][/size]
[size=84][i]There are only 3 way to complete a project: Good, Fast or Cheap, pick two.[/i][/size]
[size=84][i]With age comes wisdom, what you choose to do with it determines whether or not you are wise.[/i][/size]
User avatar
rcaston
Cacti User
Posts: 204
Joined: Tue Jan 06, 2004 7:47 pm
Location: US-Dallas, TX
Contact:

Post by rcaston »

rony wrote:FYI, distributed polling doesn't mean that the polling machine needs access to the rrdtool files.

In the model proposed for 0.9.0, the poller is distributable, but the rrdtool updates occur on a central machine.
true, thats another way of doing it; advantages and disadvantages to each method ...

if the rrd data collection remains local; you don't need to worry about network connectivity between pollers.

But if the rrd files and the poller are distributed across different datacenters/locations, you'd need to include some sort of local store and delayed update of the rrd files in the event of a network outage between poller and rrd repository.
User avatar
rony
Developer/Forum Admin
Posts: 6022
Joined: Mon Nov 17, 2003 6:35 pm
Location: Michigan, USA
Contact:

Post by rony »

With rrdtool files in a central store and boost plugin functionality being included in 0.9.0, I think it will resolve most of these problems.


There are many ways to skin this cat, but I believe that we can't rely on a distrubuted storage system. So, centralized rrdtool updates based upon access (boost), will vastly improve the polling system.

I should also note that the distributed polling system is being designed to be self healing.

For more information, we should really get Larry (TheWitness) involved in this post.
[size=117][i][b]Tony Roman[/b][/i][/size]
[size=84][i]Experience is what causes a person to make new mistakes instead of old ones.[/i][/size]
[size=84][i]There are only 3 way to complete a project: Good, Fast or Cheap, pick two.[/i][/size]
[size=84][i]With age comes wisdom, what you choose to do with it determines whether or not you are wise.[/i][/size]
User avatar
rcaston
Cacti User
Posts: 204
Joined: Tue Jan 06, 2004 7:47 pm
Location: US-Dallas, TX
Contact:

Post by rcaston »

rony wrote:For more information, we should really get Larry (TheWitness) involved in this post.

I'd like to learn more about the features being worked on in 0.9.0
wjm
Posts: 20
Joined: Fri Oct 13, 2006 12:06 pm

Post by wjm »

I am very interested in a distributed model.
To change cacti from a tool my team uses for mostly routers and the dozen or so servers I care about, to a tool I can share with other teams would explode the amount of devices that could be polled.
User avatar
rcaston
Cacti User
Posts: 204
Joined: Tue Jan 06, 2004 7:47 pm
Location: US-Dallas, TX
Contact:

yeah...

Post by rcaston »

The problem I am facing with using cacti as a true large enterprise solution requires having multiple distributed pollers.

Any solution which results with only a single poller server can not scale well.

imagine trying to manage a pair of routers at about 30 physical sites with about 3,000 interfaces each.

one poller will not handle (2 x 30 x 3000 = 180,000 interfaces)

assuming you are only doing traffic in/out; that would be (2 x 180,000 = 360,000) RRA files. As nice as cactid is; has anyone ever had it to poll 360,000 elements in a 5 minute cycle?

or... Anyone ever tried to open a directory with 360,000 files in it? :o
Last edited by rcaston on Fri Apr 13, 2007 11:19 am, edited 1 time in total.
User avatar
Howie
Cacti Guru User
Posts: 5508
Joined: Thu Sep 16, 2004 5:53 am
Location: United Kingdom
Contact:

Re: yeah...

Post by Howie »

rcaston wrote:assuming you are only doing traffic in/out; that would be (2 x 180,000 = 360,000) RRA files. As nice as cactid is; has anyone ever had it to 360,000 elements in a 5 minute cycle?
Small point - it's "only" 180,000 RRAs since the in/out go into the same file. Still a heck of a lot though :-)
Weathermap 0.98a is out! & QuickTree 1.0. Superlinks is over there now (and built-in to Cacti 1.x).
Some Other Cacti tweaks, including strip-graphs, icons and snmp/netflow stuff.
(Let me know if you have UK DevOps or Network Ops opportunities, too!)
User avatar
rcaston
Cacti User
Posts: 204
Joined: Tue Jan 06, 2004 7:47 pm
Location: US-Dallas, TX
Contact:

Re: yeah...

Post by rcaston »

Howie wrote:
rcaston wrote:assuming you are only doing traffic in/out; that would be (2 x 180,000 = 360,000) RRA files. As nice as cactid is; has anyone ever had it to 360,000 elements in a 5 minute cycle?
Small point - it's "only" 180,000 RRAs since the in/out go into the same file. Still a heck of a lot though :-)
hah, meant to add in traffic + errors ... :-? ..

But yes, if the rra's and pollers were distributed; we could scale sideways until the Web or MySQL server becomes the bottleneck, at which point a potential solution could be using load balanced web servers with Replicated MySQL servers.
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Re: yeah...

Post by gandalf »

rcaston wrote: one poller will not handle (2 x 30 x 3000 = 180,000 interfaces)
I suppose that this amount will be handled by cacti/cactid using the boost plugin. Nevertheless, a distributed thingy would be a better solution.
Reinhard
User avatar
TheWitness
Developer
Posts: 17047
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

Boost 1.x should scale to installation with approximately 200k data sources (directory access issue aside). My plans for boost 2.0 include a method to perform the following:

0) Associate a host with a poller.
1) Provide poller based directory structures
2) Create host subdirectories (aka each hosts RRD's are in a single direcory)
3) Ability to have RRD updates occur in a distributed fashion using boost servers (distributed as well).

I may pull 0 and 3 out till 0.9. It all depends on timing. In the mean time, that will resolve the directory access issues.

The real issue as I see it is that there will always be a site that is just to big for Cacti. Well, that applies for every other tool as well.

With Boost 1.2, you can however, do the following:

1) Primary Database Server with Either MYISAM or MEMORY based Boost
2) Primary Poller with Local Storage (Lot's of it) and Boost Server
3) Web Farm with one to many Web Servers using a smaller version of Cacti with sym links to the RRA and Boost Cache folders.
4) Load Ballancer in front of the Web Farm.

With this configuration, you will have an enterprise system. For most enterprises anyway.

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
User avatar
rcaston
Cacti User
Posts: 204
Joined: Tue Jan 06, 2004 7:47 pm
Location: US-Dallas, TX
Contact:

Post by rcaston »

TheWitness wrote: The real issue as I see it is that there will always be a site that is just to big for Cacti. Well, that applies for every other tool as well.
Well, there is truth to that; I would say that if you do manage to implement the feature roadmap you've outlined above with boost; it will go a long way to increasing cacti's ability to manage large environments.

In fact; if you do get the ability to associate a poller machine with a target host; you'd bring cacti into the league of most commercial solutions which would really make cacti shine.

Considering that even a 'budget' commercial poller is going to retail for around $45,000 U.S. per server/software setup. And that is an actual quote for a poller appliance (running linux) which only handles about 30,000 devices.

If Cacti could scale sideways; it would attract a great deal more attention from the business community.

My only concern is, while I agree with and would support the rrd's being broken out to subdirs. it may potentionally create a small migration hurdle if you try to move from a non-boost to a boost setup, and then back again; since the RRA's will need to be moved back and forth. However, I believe someone posted that boost may end up as a standard part of cacti 0.9.x, if so - this point is moot.

All that being said; I really like where boost is going.
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Post by gandalf »

TheWitness wrote:Boost 1.x should scale to installation with approximately 200k data sources (directory access issue aside). My plans for boost 2.0 include a method to perform the following:

0) Associate a host with a poller.
It would love this. IMHO, this feature would allow for different interesting structures:
a) Associate a poller with a "site" (location) to "localize" cacti polling traffic by assigning all hosts of this site to that very poller
b) Support for multiple polling intervals (in this case, all polling intervals of this host would be the sam, but I suppose this is not a big issue)
1) Provide poller based directory structures
? Please elaborate. I do not get the point why this would be useful, sorry
2) Create host subdirectories (aka each hosts RRD's are in a single direcory)
Yep, yep. IMHO that's a must even for mid-sized installations. I did not make any tests on directory search impact using caczi, but some months ago Tobi Oetiker published a testing script that shows some impact.
3) Ability to have RRD updates occur in a distributed fashion using boost servers (distributed as well).
But all rrd files still in the "same" location (different subdirectories, but same file system)?

just my 2 cents
Reinhard
Post Reply

Who is online

Users browsing this forum: No registered users and 4 guests